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ABSTRACT 


We  consider  the  Fourier  transform  of  a positive  function  f(*)  (or 
its  sample  Fourier  transform;  as  a possibly  complex  covariance  function 
of  a hypothetical  stationary  coop lex- valued  time  series.  We  model  this 
time  series  by  an  autoregressive  process  of  order  p whose  spectral 
density  approximates  (or  estimates)  the  function  f(*)  . 

We  show  the  equivalence  of  this  interpretation  with  the  theory  of 
orthogonal  polynomials  on  the  unit  circle;  we  study  the  consistency  of 
the  autoregressive  estimator  as  p increases  with  the  sample  size. 

We  also  make  an  exploratory  investigation  of  this  new  method  as  a 
density  estimation  method  following  throe  approaches:  the  direct  approach 

the  hazard  approach  and  the  sparsity  approach. 
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SUMMARY 


What  has  been  done: 

In  Part  I,  we  look  at  the  autoregressive  method  from  the  practi- 
cal standpoint.  Our  aim  is  twofold: 

— to  find  out  how  the  autoregressive  method  behaves  In  many 
different  situations,  when  the  true  answer  is  either  known  or  unknown  • 

— to  compare  the  autoregressive  method  to  other  methods  of 

curve  estimation  currently  used:  the  kernel  method,  the  spline  method, 

the  orthogonal  series  method  and  a quantile  expansion  method. 

It  is  properly  impossible  to  summarize  this  kind  of  work.  One 
can  form  one's  opinion  only  by  reading  the  text  and  looking  at  the 
pictures  we  produced.  Our  opinion  is  that  the  method  is  very  versatile 
it  always  yields  a positive  function;  it  is  very  easy  to  use  and  it 
works  well! 

In  Part  II,  we  look  at  the  autoregressive  method  from  the 
theoretical  standpoint. 

In  Chapter  3,  we  unify  the  time  series  interpretation  and  the 
orthogonal  polynomial  interpretation  of  the  autoregressive  method. 

From  the  time  series  point  of  view,  our  treatment  is  slightly  more 
general  than  the  current  practice  in  that  our  autoregressive  coeffi- 
cients are  complex  numbers. 

Chapters  4 and  5 are  parallel. 


In  Chapter  4,  we  study  the  different  modes  of  convergence  of 
the  autoregressive  -ethod,  that  is  the  autoregressive  method  as  an 
approximation  method. 

The  weakest  result  is  Theorem  4.1: 
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The  strongest  result  is  Theorem  4.3: 
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if 


(2) 


condition  (1)  holds 

0 < m £ f(x)  £ M , 

a.e.  in  [-tt,tt] 

f(*) 

= g(*)  , 

a.e.  in  [-TT,TT] 

g(*) 

e Lip  (a, 2) 
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then 
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, uniformly 


In  Chapter  5,  we  prove  the  consistency  cf  the  autoregressive 
method  as  in  Theorem  5.9: 
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in  probability  uniformly. 


Finally  in  Chapter  6,  we  apply  these  different  results  to  the 
problem  of  density  estimation. 


What  has  to  be  done: 

In  the  theory: 

We  have  to  find  the  asymptotic  distribution  of  the  autoregressive 
estimator  when  we  allow  the  order  p to  increase  with  the  sample  size  at 
a given  rate.  Berk  (1974)  has  worked  in  this  direction.  Vte  have 
weakened  some  of  his  assumptions  for  the  consistency  of  the  autoregress- 
ive coefficients  (see  section  5.2),  but  we  haven't  touched  the  other 


problem. 


Then  will  come  the  problem  of  finding  global  confidence  bounds. 
We  also  need  a criterion  to  choose  the  order  p . 

In  the  applications: 

The  most  important  task  here  is  probably  to  justify  rigorously 
the  use  of  the  autoregressive  method  in  those  cases  where  F(*)  is 
unbounded  (see  section  6.2). 

He  have  not  touched  the  problem  of  estimating  the  intensity 
function  of  a counting  process. 
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INTRODUCTION 

.n  statistics1  analysis  we  are  often  interested  in  estimating 
curves,  probably  because  of  the  heavy  emphasis  on  visual  training  in  our 
cultures.  The  curves  are  not  always  drawn  b”t  they  could  be  as,  for 
example,  in  regression  analysis:  now  people  more  routinely  take  a look 

at  residual  plots  to  judge  the  fit  to  the  data  provided  by  the  regression 
line  (e.g.,  Tukey  (1970,,  Feder  (1974)), 

Actually,  there  is  increased  interest  in  drawing  the  curves, 
thanks  to  computer  graphics.  A few  people  ha\  very  imaginatively  pro- 
posed -nd  developed  new  ways  of  visualizing  the  data  providing  statisti- 
cal scientists  with  new  means  to  gain  insight  into  the  data  and  to  convey 
these  insights  to  their  clients  (e.g.,  Andrews  (1972),  Chemoff  (1973), 
Cleveland  and  Kleiner  (1975)  ). 

After  a long  domination  of  parametric  techniques,  we  can  now  let 
the  data  speak  for  itself.  That  is  what  the  so-called  non-parametric 
techniques  are  attempting  to  do.  There  are  already  several  non-parametric 
estimators  of  curves  like  probability  density  functions,  hazard  functions, 
intensity  functions,  ... 

A universal  requirement  seems  to  be  the  smoothness  of  the  estima- 
tors. Indeed,  smoothness  allowj  easy  integration  and  differentiation, 
when  required.  Estimators  should  also  belong  to  the  class  of  functions 
they  are  trying  to  estimate.  Finally  the  methods  of  estimation  should 
be  easy  to  use. 

We  review  briefly  the  general  methods  that  have  appeared  in  the 
literature;  then  we  expose  the  new  autoregressive  method. 
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But  first  a word  on  the  notation.  We  will  stick  to  the  following 
conventions: 

— a function  g(*)  is  approximated  by  a function  8m(*)  > 
where  m denotes  the  order  of  approximation; 

A 

— an  approximator  g (•)  is  estimated  by  a function  g (•)  ; 

m m 

— a function  g(*)  is  estimated  directly  from  a sample  by  a 
function  gn(*)  » where  n denotes  the  sample  size. 

Also,  in  the  appendix  to  this  chapter  one  will  find  a glossary  of  the 
terms  that  are  followed  in  the  text  by  an  asterisk. 
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The  Kernel  Method 
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The  kernel  method  was  introduced  and  developed  by  Rosenblatt  (1956) 
and  Parzen  (1962).  It  gives  a general  way  of  estimating  derivatives  of 
functions  by  smoothing  the  first  differences  of  a crude  estimator  of 
these  functions  (usually  a step  function),  using  a weight  function 
called  a kernel. 


Let  f (*)  - F'(»)  . Suppose  F(»)  has  been  estimated  at  n 
points  x^,...,x^  (the  data  points)  by  FnC#)  • Then  f(*)  is  estimated 

by  fn^  : 


V1!  - 


< 

0)1  > 

«■  * 


where 


K(*)  , the  kerne 1^  and  {h^}  are  chosen  appropriately 
Fn(x^  + 0)  - Fq(x^  “ 0)  is  the  value  of  the  jump  of 
Fn(»)  at  x^  . 


Parzen  and  Rosenblatt  were  interested  in  estimating  a probability 
density  function  (*).  Then  Watson  and  Leadbetter  (1964)  applied 
indirectly  the  same  technique  to  hazard  functions  (*)  . It  could  also 
be  used  for  estimating  sparsity  function  (*)  or  intensity  functions  (*) 
of  counting  processes;  these  applications  have  not  been  studied  yet. 
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The  Quantile  Expansion  Method 

This  method  is  due  to  G.  P.  Sillitto  (1969,  1971).  It  has  to  do 
with  estimating  the  quantile  function  (*),  using  shifted  Legendre  ortho- 
gonal polynomials.  Ftom  the  derivative  of  the  quantile  function,  the 
sparsity  function,  one  can  easily  get  the  density  function. 

Let  F(*)  be  a strictly* increasing  continuous  distribution 
function  (*) , and  Q(*)  the  associated  quantile  function. 

Let  q(«)  = Q'(»)  and  f(»)  * F’(*)  . Then 

£(X)  " q(F(x»  and  q(t)  = f(Q(t)) 

The  quantile  function  is  estimated  by 


where  


m 


- E (23  - X)  Xj  Pj., 


(t) 


Pj  ^ is  the  shifted  (to  [0,1])  Legendre  polynomial  of 
degree  (j  - 1) 

A 

Xj  is  a linear  combination  of  the  order  statistics. 

A A 

then  computes  q^*)  and  estimates  f(«)  by  fm(0  • 


A A 


qm(t) 


One 
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0.3  The  Spline  Method 

In  the  practice  of  density  estimation,  there  seems  to  be  different 
approaches  placed  in  this  category.  Boneva,  Kendall  and  Stefanov's 
method  (1971)  is  really  a variant  of  the  kernel  method  whers  the  chosen 
kernel,  called  a deltaspline,  satisfies  certain  extremal  properties 
arrived  at  through  an  original  application  of  the  theory  of  splines. 

More  properly  (as  to  the  classification),  one  can  smooth  the 
empirical  quantile  function  or  the  empirical  distribution  function  using 
spline  functions  and  ti;en  differentiate  these  smooth  versions,  as  in 
tfahba  (1971). 

A general  description  of  this  second  approach  runs  as  follows: 

k 

given  an  ordered  set  of  knots  » a corresponding  set  of  estimated 

functional  values  , find  a real-valued  function  F(*)  such  that 

F(xi)  “ yi  » i « l,...,k 

and  other  appropriate  conditions  are  satisfied.  These  other  conditions 
define  the  class  of  spline  functions  to  be  used,  e.g.,  cubic  splines, 
splines  under  tension,  ... 

We  have  considered  splines  under  tension  where  the  extra  condition 

2 

is  that  given  a > 0 (the  tension  factor),  F"(*)  - o F(*)  varies 
linearly  on  each  of  the  intervals  3 > i ■ l,...,k-l  . 

This  description  was  borrowed  from  Cline  (1974  ).  The  imposed 

k 

linearity  condition  allows  estimation  of  the  {y^}^=^  by  least-squares 


C 


method. 
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The  Weighted  Fourier  Series  Method 


Let  f(»)  be  nonnegative  and  defined  on  [-TT,TT]  . Then 

TT  . 

f*  ivx 

cp(v)  * j e f(x)  dx  is  determined  by  its  value  at  v=  0,1,2,... 

-TT 

Now  if  f{»)  is  square- integr able  (*),  we  have  the  following  inversion 
formula: 


£ e"ivXcp(v) 

ys  -00 

TT  . 

If  f(«)  is  unknown,  we  can  estimate  cp(v)  by  cp  (v)  » j e1VXd  F 

n * n 

-TT 

where  Fq(*)  is  an  estimator  of  the  cumulative  of  f(*)  , and  then 
invert  a weighted  version  of  <Pn(*)  to  get  fn(*) 

09 

fn(x)  £ e"ivxw(v)  cp  (v) 

v=  -® 

where  w(v)  goes  to  0 as  Jvj  gets  large.  Watson  (1969)  has  deter- 
mined that  the  optimal  weights  are 


„(v)  . !»CyJ|2 

!<P(V)|2  +5  (1  - |tp(v)|2) 

Some  have  used  0-1  weights  (e.g.,  Kronmal  and  Tarter  (1968)). 
Thaler  (1974)  has  modeled  the  optimal  weights  from  the  sample. 

Another  approach  that  we  favor  is  to  use  truncation  along  with 
estimated  optimal  weights: 


f (*) 

Cl 


2TT 


S e'ivXwn(v)  cpn(v) 


/ 


0.5  The  Autoregressive  Method 


We  now  come  to  the  main  object  of  this  dissertation.  The  auto- 
regressive method  got  its  name  from  time  series  analysis. 

Let  F(»)  be  a bounded  nondecreasing  function  defined  on 
(-tt,tt]  . Let  R(»)  be  the  Fourier-Stieltjes  transform  of  F(»)  , 


R(v)  = J*  eivXdF(x)  , jvj  « 0,1,2,... 


Solve  the  following  system  (Yule-Walker  equations)  * 


R(-l)  R(0) 


R(-m  + 1)  R(-m  + 2>  . . 


R(m  - 1) 
R(m  - 2) 


K(-2) 


R(-m) 


This  can  be  seen  either  as  fitting  an  m order  autoregressive 
scheme  (*)  from  the  Yule-Walker  equations  involving  a complex  stationary 
covariance  function  R(«)  , or  as  building  a set  of  orthogonal  polynomials 
on  the  unit  circle,  with  basis  the  complex  exponentials  and  inner  product 
defined  by 


(g,h)  - J g(eiX)  h(eiy)  d F(x) 


The  autoregressive  approximator  f (•)  is  given  by 


f (x) 
m 


i1+ai 


i 

ix  , 2ix 
e +n^  e 


, mix. ' 

1 


- s - 


and  is  such  that 


J eivxf  (x)  dx  = R(v)  , jvj  = 0,1, ...  ,m 
-TT 

Thus  £^(0  is  approximating  in  a certain  way. 

The  autoregressive  estimator  is  obtained  in  the  same  way,  except 
that  R(*)  has  to  be  estimated  first  by  R (•)  . 
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Appendix 

O.A.l  Glossary  of  Terms 

1 - A distribution  function  F(»)  is  a nondecreasing  function  continu- 

ous from  the  right  and  such  that  lim  F(x)  ==  0 and  lim  F(x)=*l  . 

x ->  -®  x~*  +® 

We  will  usually  consider  that  F(*)  is  absolutely  continuous  with 
respect  to  Lebesgue  measure,  i.e.,  we  will  assume  the  existence  of 
a function  f(«)  , called  a density  function,  such  that 
F(x)  * J f(u)  du  . Then,  f(«)  * Ff(»)  a.e.  Sometimes,  we 

—09  SO 

P 2 

assume  that  f(»)  is  square  integrable.  i.e.  J 1 f (u.)  J du  < » . 

2 - Suppose  that  the  distribution  function  F(»)  is  strictly  increasing 

and  absolutely  continuous.  Then  we  can  define  the  functional  in- 
verse of  F(*)  , denoted  Q(»)  , called  the  quantile  function.  Q(*) 
is  defined  on  [0,1]  and  if  t **  F(x)  , then  Q(t)  * x . Under 
the  preceding  conditions,  there  exists  a function  q(»)  , called 
the  sparsity  function,  such  that  q(»)  * Q'(*)  • Now  if  we  let 
f\*>  * F‘(»)  , we  will  have  the  following  relations: 

£(x>  " 5(fw)  sn,i  q(t)  * 7mm 

Tukey  may  have  been  the  first  one  to  use  the  term  "sparsity." 

3 - Now  let  F(*)  be  an  absolutely  continuous  distribution  function 

defined  on  [0,®)  . The  hazard  function  *(•)  is  defined  as 

h(x)  - ~ log(l  - F(x))  » 


c 


f(x) 

1 - F(x) 


It  is  interpreted  as  an  instantaneous  failure  rate 


lim 
dx->  0 


P(x  < T s x + dx  I T > x) 
dx 


h(x) 


where  T is  a random  variable  with  distribution  F(») 

For  a counting  process  N(t)  , we  can  define  the  concept  of  an 
intensity  function  X(*)  as 


X(t)  - ~ E[N(t)  ] 

where  £{•]  denotes  the  expectation  of  a random  variable. 

An  mfck  order  autoregressive  scheme  is  a stochastic  process  satis- 
fying the  following  difference  equation 


X(t)  + o^XCt- !)  + ...  + o^XCt-m)  = e(t) 


where  the  e(*)‘s  are  uncorrelated. 
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WLen  one  is  presented  with  a new  and  radically  different  way  of 
doing  something  for  which  there  exists  already  quite  a few  techniques, 
one  is  naturally  cautious.  There  is  a natural  principle  of  economy  that 
needs  to  be  respected  before  one  may  yield  his  approval.  In  the  next 
two  chapters  we  try  to  demonstrate  in  the  most  practical  way  that  the 
autoregressive  method  deserves  to  become  a standard  technique  of  non- 
parametric  curve  estimation. 

In  the  genesis  of  this  work,  the  contents  of  these  first  two 
chapters  opened  the  way  to  several  questions  to  which  much  attention 
will  be  devoted  in  the  second  part.  Also,  in  all  fairness  tc  the  com- 
peting techniques,  we  include  some  practical  suggestions  that  make  them 
perform  better. 


CHAPTER  1 


APPROXIMATING  DENSITIES  AND  HAZARD  FUNCTIONS 
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Most  of  the  time,  people  use  simulation  to  validate  a new  method. 
In  the  words  of  Boneva,  Kendall  and  Stefanov  (1971,  p.  I):  "We  have  for 

the  present  worked  chiefly  with  simulated  material  for  the  excellent 
reason  that  what  is  essentially  a diagnostic  aid  is  most  severely  tested 
when  one  knows  what  the  answer  should  be."  In  simulation,  the  answer  is 
not  exactly  known.  As  a matter  of  fact,  the  sampling  variability  of  the 
simulation  process  can  yield  "bad"  samples.  Should  we  feel  free  to 
sample  until  we  get  the  "right"  answer?  This  liberty  is  not  available 
in  the  real  data  case.  What  can  we  do? 

It  depends  on  what  the  method  can  do.  If  the  method  can  be  used 
in  a non- stochastic  context,  then  it  is  possible  to  validate  it  without 
reference  to  sampling  fluctuations.  We  call  this  process  a first-hand 
validation.  If  the  method  is  inseparable  from  the  stochastic  context, 
we  may  obtain  consensus  validation  by  comparing  it  to  other  methods 
on  the  same  data.  This  is  referred  to  as  second-hand  validation. 

Consider  the  kernel  method  for  instance.  A first-hand  validation 
would  require  n exact  F(»)  values.  This  would  not  be  a very  inform- 
ative validation  since  we  know  that  if  n is  large  enough,  even  linear 
interpolation  would  give  us  a good  fit  for  the  derivative.  The  same  can 
be  said  of  the  spline  method. 

In  the  autoregressive  method,  a first-hand  validation  requires 
knowledge  of  the  true  R(»)  sequence.  How  many  elements  of  the  sequence 
do  we  need  to  get  an  approximator  f (•)  close  enough  to  f(*)  ? Now 
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this  question  remains  meaningful  in  the  real  data  case  because  the 
sample  si^e  n does  not  compel  us  to  choose  any  order  m . So  any 
insight  we  gain  from  the  validation  process  becomes  handy  when  we  are 
confronted  with  real  data.  The  same  can  be  said  of  weighted  Fourier 
series. 

Another  reason  to  go  through  this  process  has  to  do  with  the 
structure  of  the  estimation  problem.  Suppose  we  want  to  estimate  a 
function  f(»)  and  f(«)  can  be  expressed  in  a form  suitable  for 

CD 

approximation  by  functions  ®ight  then  estimate  f (•) 

A 

by  f (•)  . Now  we  have  that 
m 

lf(.)  - fm(*)l  2 * !(«•)  - £„<•>]  + |fn(->  - *m<-))l2 

The  validation  process  can  be  of  much  help  in  the  study  and  control  c.y 

the  bias  function  b (•)  : 

m 

b (•)  - f(«)  - f (•) 

m m 

In  this  chapter  the  emphasis  is  on  validating  the  autoregressive 
method,  though  we  will  make  some  comparisons  with  the  weighted  Fourier 
series  method. 

In  the  probability  density  case,  the  function  R(*)  is  simply 


( 


the  characteristic  function.  Accordingly  we  shall  adopt  the  more  usual 
notation  cp(*)  in  the  ensuing  discussion. 


- !«,- 


1.1  The  Idea  of  Truncating 


A natural  choice  to  start  with  is  the  Cauchy  density 


f(x) 


rr  ..  . 2.  tt  . , .2 

<1  + x ) 1 1 + ix 


with  characteristic  function  cp(v)  = e lvl  , as  our  approximators  are 
of  the  form 


f 00 

m 


5s 

2rr  * 


m 


Figure  1 reproduces  the  first  picture  we  obtained.  Obviously 
something  had  gone  wrong.  But  then,  our  approximators  are  only  defined 
on  [— TT,tt1  : more  precisely  they  are  periodic  with  period  2tt  ; and 

the  interval  [-tt,TT]  contains  only  about  807.  of  the  Cauchy  distribution. 
Moreover,  even  though  the  truncated  density  is  proportional  to 

the  complete  density  f(«)  on  the  interval  of  truncation  T , it  is 
zero  outside  that  interval.  That  is 


fT(x)  = 


J1*L 


J*  f(u)  du 

T 


6T 


x£t 


Thus  the  Fourier  transform  of  f^(»)  , , is  not  proportional  to 

the  Fourier  transform  of  f(*)  , cp(*) 
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Let  T be  the  interval  [-77, TT]  . When  f(*)  is  known,  we  can 
evaluate  cp^(»)  directly  using  the  Fast  Fourier  transform  technique. 
If  only  Cp(«)  is  known  and  is  integrable,  then 


?T(v) 


= -J* 

TT 


Cp(uu) 


sin  tt(uu  - v) 


(JJ  - v 


d(jj 


where 


SB 

..-1  r ...  . sin  TT  tu  , 

K = J Cp(uu)  — dm 

—00 

Using  this  correction,  we  produced  Fig.  2 and  3 that  show  a 
remarkable  fit.  The  distortion  is  not  as  important  when  most  of  the 
area  is  contained  in  the  interval  T . 

What  happens  when  the  density  is  defined  only  on  a subinterval 

of  [-tt,tt]  ? Consider  for  example  the  uniform  density  on  [-1,1]  with 

characteristic  function  Cp(v)  **  ^ . Figure  4 shows  a wild  behavior 

that  is  corrected  only  when  the  uniform  density  is  made  to  fill  the 

whole  interval  [-TT,tt]  as  then  cp(0)  **  1 and  cp(v)  = 0 , v ^ 0 , 

which  yields  f (•)  = — for  all  values  of  m . 
m ^rr 

Note:  The  symbol  o represents  the  true  curve  and  the  solid  line 

the  fitted  one,  unless  otherwise  noted. 
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1.2  The  Idea  of  Averaging 

The  wild  behavior  depicted  in  Figure  4 is  more  or  less  typical. 
This  effect  wears  off  quite  rapidly  in  certain  cases,  but  in  others  it 
persists.  This  difference  in  behavior  is  intriguing.  Let  us  see  the 
evidence. 

Consider  the  standard  Gaussian  density.  Why  pay  much  attention 
to  the  low  order  approximators  (Fig.  5 and  6)  as  convergence  is  still 
taking  place  (Fig.  7)?  We  could  disregard  this  as  an  amusing  oddity  that 
unimodality  and  bimodality  alternate.  But  when  we  shift  the  density  by 
e.  small  amount  to  a N(0.5,l)  , the  oddity  becomes  alarming  because  it 
persists  and  is  aggravated  (Fig.  8 and  9). 

Now  let  us  superimpose  these  pictures  (Fig.  10  and  11).  We  get 
the  remarkable  fact  that  successive  orders  complement  each  other,  so  that 
if  we  average  them  (Fig.  12),  the  bias  is  greatly  reduced. 

This  is  not  a problem  of  stability  of  the  algorithm  used,  as  the 
phenomenon  is  observed  both  with  real  and  complex  characteristic  func- 
tions. More  likely,  this  is  related  to  the  very  structure  of  the 
approximator.  We  will  have  to  examine  this  in  Part  II  in  our  study  of 


the  bias  function. 
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1.3  The  Idea  of  Symmetrizing 

The  next  important  class  of  functions  to  be  used  in  our  valida- 
tion study  is  the  class  of  gamma  densities  (exponential,  chi-square,  ...) 

The  exponential  density,  with  its  discontinuity  at  the  origin, 
was  expected  to  be  difficult  to  approximate  mainly  because  the  approxima- 
tors are  periodic  (f  (-tt)  = fm(Tr))  . Figures  13  and  14  illustrate 
our  fears.  One  way  to  compensate  the  basic  disequilibrium  of  the  tails 
is  to  symmetrize  the  density  so  that  both  tails  are  equal. 

We  would  like  to  point  out  that  symmetry  is  not  what  matters  but 
rather  that  the  tails  be  comparable,  as  our  approximation  of  non-central 
Gaussian  densities  show  (fig.  12). 

By  symmetrizing  the  exponential  density,  we  get  the  Laplace  (or 
double-exponential)  density,  and  the  discontinuity  at  the  origin  is  now 
shifted  down  to  the  first  derivative.  In  Figure  13  ve  see  that  the 
peak  has  been  somewhat  smoothed,  but  the  fit  is  really  good.  The  right 
half  gives  us  a nice  exponential  curve. 

The  chi-3quare  densities  with  larger  degrees  of  freedom  do  not 
present  any  discontinuity,  but  their  left  and  right-hand  tails  are  very 
different.  We  shall  compare  symmetrization  to  scaling. 

Let  Y be  a random  variable  distributed  as  a chi-square  with 
four  degrees  of  freedom.  Let  X * 0.5 Y . The  density  of  / ' % 

-x 

= x e 


f(x) 


I 


. Let  us  examine  the 


- li  - 

We  symmetrize  it  to  f^(x)  ~ 0*5  |x|  e 

results. 

By  comparing  Figures  16  and  17 , we  see  that  symmetrization  pro- 
duces a smoother  approximator  than  non- symmetrization  for  a given  order. 
The  same  holds  true  after  averaging  two  consecutive  orders  (Fig.  19 
and  20).  But  when  we  symmetrize,  we  usually  have  to  go  to  higher  orders 
to  get  a better  fit  (Fig.  18).  It  is  also  possible  to  average  more 
orders  to  get  better  results  (compare  Fig.  21  with  Fig.  19). 

Let  us  now  look  at  the  density  of  X * 0.25 Y , i.e., 

- 2x 

f(x)  a 4xe  . The  right  extremity  is  at  about  the  same  height  as 
the  left.  The  approximators  now  perform  well  at  both  tails  (Fig.  22), 
illustrating  that  syianetrization  is  not  what  we  really  need  when  the 
mode  is  more  centrally  located. 

Note  that  symmetrization  here  is  not  taken  in  the  same  sense  as 
in  Feller  (1966).  The  difference  is  apparent  at  the  characteristic 
function  level.  Feller’s  procedure  transforms  the  characteristic 
function  into  its  square  modulus.  Our  procedure  transforms  it  into 
its  real  part. 


c 




g ' 

JL— — — _____ _____ _____ 


0.5  X2  (4  d.f.) 
averaging  orders  10  and 

i 


11 


Symmetric  0.5  X2  (4  d.f.) 
averaging  orders  9,  10,  11  and  12 


0.25  X2  (4  d.f.)  order  19 


f 
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1.4  Comparison  with  the  Weighted  Fourier  Series  Method 

We  first  note  that,  in  the  case  of  validation,  the  weighted 
Fourier  series  method  uses  only  0-1  weights,  i.e.,  the  density  is 
approximated  by  successive  truncated  Fourier  series.  This  method 
requires  as  in  the  autoregressive  method,  that  we  truncate  the  density 
to  [-iT,Tl]  . We  also  note  tnat  this  is  the  standard  way  to  invert  a 
known  Fourier  transform.  Thus,  it  should  perform  rather  well. 

For  the  Cauchy  density,  there  is  almost  no  difference  between 
the  two  methods  (Fig.  23  and  24  compared  to  Fig.  2 and  3).  But  in  the 
case  of  the  triangular  density  on  [-tt,tt]  , the  autoregressive  approxi- 
mator wobbles  about  the  true  density  (Fig.  25);  averaging  orders  8 and 
9 reduces  the  bias  pretty  much  (Fig.  26).  But  thr  truncated  Fourier 
series  is  right  on  the  target  (Fig.  27). 

In  the  exponential  case,  we  notice  the  same  "odd-even"  effect 
(Fig.  28  and  29).  But  there  are  differences:  both  tails  are  badly 

approximated  (compared  with  Fig.  14)  and  the  approximators  become  nega- 
tive. Symmetrizing  improves  the  matter  (Fig.  30)  as  in  the  autoregressive 
method  (Fig.  15). 


f 


1.5 


Approximation  of  a Hazard  Function 


When  R(»)  is  taken  as  the  Fourier  transform  of  log  (1  - F(«)) 
the  autoregressive  method  produces  approximators  of  the  hazard  function 
related  to  the  distribution  F(*)  . In  view  of  our  previous  work,  we 
did  not  feel  that  it  was  necessary  to  include  more  than  one  example, 
which  we  have  taken  to  be  the  hazard  function  of  a truncated  exponen- 
tial (Fig.  31  and  32). 

From  the  hazard  function,  it  is  possible  to  recover  the  original 
density  as  follows: 

r* 

f (x)  = h(x)  exp(— J h(u)  du) 

0 

When  we  apply  this  transformation  to  our  hazard  approximators,  we  obtain 
the  density  approximators  represented  in  Figures  33  and  34.  (Note  we 
modified  the  hazard  approximators  at  the  origin  to  be  1 .) 
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1.6  The  Chi-square  case  revisited 

The  chi-square  densities  having  shown  to  be  more  difficult  to 
approximate,  they  furnish  a valid  testing  ground  for  some  alternate 
representations  of  the  densities.  We  have  already  mentioned  the  hazard 
representation  and  the  excellent  results  it  produced  for  the  exponential. 
In  thp  chi-square  case  we  obtain  Figures  35  and  36  . The  results  are 
not  so  good. 

There  is  another  representation  that  has  not  been  used  very 
often,  namely  the  sparsity  representation  mentioned  in  the  Appendix  0.A.1 
(item  2).  The  pictures  we  obtain  (Fig.  37  and  38)  compare  favorably  to 
direct  approximators  of  higher  order  (e.g.,  Fig.  20). 
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1.7  A Look  at  the  Output  Parameters 

We  have  left  aside  up  tc  now  all  numerical  considerations.  To 
draw  all  the  information  contained  in  our  validation  work,  there  only 
remains  to  consider  the  output  parameters  that  define  our  approximators 
the  coefficients  and  the  proportionality  factor  that 

normalizes  the  approximator  to  integrate  to  R(0)  . 

We  consider  only  four  cases  to  illustrate  the  points  we  want  to 

make. 

Table  1.1  exhibits  the  relation  between  the  parameters  and  the 
kind  of  picture  we  get.  and  the  coefficients  {a  are  con- 

verging nicely,  the  same  way  f^C*)  approximates  the  Cauchy  density 
(Fig.  2 and  3). 

Table  1.1  Some  parameters  of  the  Cauchy  density  autoregressive 
approximator 

Order  Coefficients  Scale  Factor 


m 

alm 

°2m 

K 

m 

1 

-0.4838 

- 

0.7658 

2 

-0.5310 

0.0974 

0.7586 

3 

-0.5346 

0.1175 

0.7575 

4 

-0.5354 

0.1198 

0.7572 

5 

• 

-0.5356 

0.1203 

0.7571 

• 

• 

11 

-0.5358 

0.1207 

0.7570 
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When  K gees  to  0 , the  coefficients  do  not  converge,  ns  m 
m 

Tabic  1.2.  There  is  also  the  danger  that  Km  will  become  negative,  in 
which  case  the  approximator  tj-l  is  negative  over  the  whole  domain 
. Figure  4 is  typical  of  the  kind  of  picture  associated  with 

this  behavior. 


Table  1.2  Some  parameters  of  the  Uniform  (-1,1)  density 
autoregressive  approximator 


Order 

Coefficients 

Scale  Factor 
K 

m 

^lm 

a2m 

m 

1 

-0.8414 

0.2919 

2 

-1.5719 

0.8681 

0.0719 

3 

-2.3304 

2.2415 

0.0170 

• 

• 

• 

11 

-8.4770 

34.5193 

1.39  x 10 

Table  1.3  exemplifies  slow  convergence.  In  such  a case  we  have 
found  it  helpful  to  average  consecutive  orders.  The  related  pictures 
are  shown  in  Figure.:  7,  8,  11  and  12. 


Table  1.3 


Some  parameters  of  the  Normal  (0.5,  1)  density 
autoregressive  approximator 


Order 

Coefficients 

Scale  Factor 

m 

alm 

a2m 

K 

m 

1 

(-0.5414,0.2936) 

0.6205 

2 

(-0.7592,0.4069) 

(0.2230,-0.3302) 

0.5220 

3 

(-0.8596,0.4538) 

(0.3679,-0.5211) 

0.4816 

9 

(-1.0079,0.4961) 

(0.6383,-0.7415) 

0.4245 

10 

(-1.0152,0.4966) 

(0.6535,-0.7471) 

0.4218 

11 

(-1.0210,0.4969) 

(0.6656,-0.7511) 

0.4196 

From  Tables  1.4  and 

1.5,  there  does  not 

seem  to  be  differences 

in  convergence;  this  reaffirms  our  finding  that  there  i3  no  real  gain 
in  symmetrizing  when  the  mode  is  relatively  central.  This  is  in  sharp 
contrast  with  the  exponential  and  Laplace  cases  in  Tables  1.6  and  1.7, 
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„ Table  1.4  Some  parameters  of  the  0.5  Chi-square  (4)  density 

^ autoregressive  approximator 


Order 

Coefficients 

Scale  Factor 

m 

alm 

c-2n 

1 

(-0.1860,-0.1224) 

0.9503 

2 

(-0.2111,-0.1173) 

(0.1069,0.0426) 

0.9377 

3 

(-0.2207,-0.1155) 

(0.1264,0.0369) 

0.9310 

10 

(-0.2461,-0.1142) 

(0.1614,0.0347) 

0.9097 

11 

(-0.2482,-0.1143) 

(0.1642,0.0350) 

0.9078 

Table  1.5 

Some  parameters 

of  the  symmetrized  Chi- 

square  (4)  density 

autoregressive 

approximator 

Order 

Coefficients 

Scale  Factor 

m 

“lm 

a;!m 

K 

m 

1 

-0.0950 

0.9909 

2 

-0.1138 

0.1970 

0.9524 

3 

-0.1065 

0.1928 

0.9511 

18 

-0.0854 

0.2432 

0.9162 

29 

-0.0837 

0.2449 

0.9140 

f 
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lable  1.6  Some  parameters  of  the  exponential  density 
autoregressive  approximator 


Order  Coefficients  Scale  Factor 


m 

alm 

K 

m 

1 

(0.4998,-0.5001) 

0.4999 

2 

(0.3997,-0.8001) 

(-0.4001,-0.1997) 

0.3999 

3 

(0.2996,-0.9001) 

(-0.6000,  0.0004) 

0.3599 

13 

(0.0782,-0.9936) 

(-0.5703,  0.4086) 

0.2936 

14 

(0.0735,-0.9943) 

(-0.5664,  0.4149) 

0.2923 

Table  1.7  Some  parameters  of  the  Laplace  (symmetrized  exponential) 
density  autoregressive  approximator 


Order  Coefficients  Scale  Factor 


m 

alm 

a2m 

K 

m 

1 

-0.5456 

0.7023 

2 

-0.6209 

0.1381 

0.6889 

3 

• 

-0.6331 

0.1928 

0.6835 

10 

-0.6370 

0.2027 

0.6820 

11 

-0.6370 

0.2028 

0.6820 

- HO- 


% 


% 


1.8  Conclusion 

1.  It  is  important  to  realize  that  the  autoregressive  method  approxi- 
mates only  functions  truncated  to  [-tt,tt]  and  that  the  domain  of  defini- 
tion of  these  functions  should  fill  the  whole  interval. 

2.  There  seems  to  be  a structural  "odd-even"  effect  that  can  be 
averaged  out. 

3.  Symmetrizing  improves  the  behavior  of  the  approximators  when  the 
functions  have  a maximum  at  the  left  end  of  the  domain. 

4.  In  view  of  the  periodicity  of  the  approximators,  they  perform  better 
when  both  ends  of  the  functions  are  comparable. 

5.  Satisfying  approximators  are  related  to  the  convergence  of  the 
parameters . 
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APPENDIX 


i.A.I  Sample  Programs  for  Approximation 

We  include  in  this  appendix  three  sample  programs,  one  for  each 
of  the  approaches  we  used. 

Each  program  is  divided  in  3 parts: 

I - Computing  the  R(»)  sequence 

II  - Solving  the  Yule-Walker  equations  in  AUTOREG  - This  subroutine 

can  be  found  in  the  appendix  to  Chapter  3. 

III  - Computing  the  density. 

In  the  direct  approach,  II  and  III  are  confounded. 

1st  program:  Approximating  a symmetrized  chi-square . 

At  the  beginning,  A(*)  contains  the  function  to  be  approxi- 
mated. Then,  using  two  IMSL  subroutines  FFT2  and  FFRDR2  , we  compute 
the  R(»)  sequence  that  Is  stored  in  A(»)  . This  is  the  end  of 
Part  I. 

In  Part  II,  we  solve  the  Yule-Walker  equations  and  compute  the 
approximator  stored  in  F(»)  . NORM(»)  contains  the  true  truncated 
density  being  approximated  ( NORM(»)  is  used  only  for  plotting  purposes). 
We  also  average  four  consecutive  orders,  the  average  being  stored  in 
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2 program:  Approximating  the  hazard  of  a chi-square. 

The  first  two  parts  are  as  in  the  first  program.  At  the  end  of 
Part  II,  F(»)  contains  the  approximated  hazard. 

In  Part  III  we  reconstitute  the  density  from  the  hazard  function. 
At  the  end  of  Part  III,  F(*)  contains  the  approximated  density. 

rd 

3 program:  Approximating  the  sparsity  of  a chi-square. 

In  Part  I,  G(»)  contains  the  chi-square  density  and  CF(»)  is 
the  quantile  function  obtained  by  the  trapezoidal  rule.  Then,  we 
rescale  CF(»)  to  be  between  -TT  and  TT  , using  subroutines  CENTER 
and  XSCALE  , and  compute  in  FOURS T I the  R(»)  sequence  stored  in 
A(  *)  : 


R(v) 


TT 

s 

-IT 


iv(2TT‘F(x)-TT) 


. {*  ivt  ,t+n. 

dx  = J e q(-2F)dt 

-TT 


by  letting  t = 2tt*F(x)  - TT  . 

In  Part  II,  we  solve  the  Yule-Walker  equations.  F(*)  now  con- 
tains the  sparsity. 

Finally  in  Part  III,  we  recover  the  density.  Subroutine 
FOURSTI  simply  evaluates 


n 

S e 

j=l 


ivX(j) 


v 


1 2 
L ’ L 


M • L 


F(j)  , 


L 


ca- 


using the  function  CSREC  , which  is  a recursion  for  cosines  and  sines: 

FUNCTION  CSREC (Cl, C2,C3) 

CS  REC  = Cl  * C2  - C3 

C3  * C2 

C2  *=  CSREC 

RETURN 

END 


i 

I 

i 
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3ROGRAN  SWEEP  l INPUT  » OUT?  UT  , TAPE 5= IN PU  T »TA  P£6=0UTPUT  > 
COMPLEX  AA 

COMPLEX  ALPHA(oG),PHIl3i)),JH,KH 
INTEGER  IWK  (121 
COMPLEX  A (2048 ) 

DIMENSION  5 (10c) 

DIMENSION  X (102)  ,F  (102) 

REml.  N0RMUC2) 

3 1 = 4.*  AT  AN  ( 1.  0) 

TWOPI=2.*PI 


>1  1 = 2048 

FNI=Nl  — - - - - - •:  - 

30  21  J=1,M  I 
F REv3=  TWOPI*  (J-l)  /FNI 

A (J)=CMPLX(  AaS(FREQ-PI)  *EXP(-AaS(FR£ft-PI*  P,  0 *)- 

21  CONTINUE 

SAIL  FFT21A  *11*  IWK)  • • 

• CALL  FFR0R2 (A,11,IWK) -- - - - -=  

A A = A( 1) 

* = 15 

30  3 J=1,M  - - -=  - — 

A (JI  = Al  Ji *(-!)**  U-l)/A4  1) 

3 CONTINUE 

Ah=AA-TWOPI/FNI 
PRINT*, aA 
NP=1CQ 
F Nr=NP 

30  13  1=1, NP 

<(i)  = -PIMI-ll*TWOPI/FNP 

3 ( I)=Q.  - - - - - - 

NORM(I)=AB$  (X(l  ) ) ♦EXP  (- AB5  CX(I)>l/AA 
13  CONTINUE 


302  <=2,M 

CALL  AUTOREG(  A ,K  ,H  ,NP.,Al  PHA  ,PHl  ,UH,KH  ,X,F  ) 

I F (K.NE • 10 . ANO.K .NE» 11 • ANO .K»N E • 12. AN NE • 13)  60  TO  2 
30  Zh  1=1, NP 


»(I)sS(I)+0«25*F(I)  - • 
'24  CONTINUE 


2 CONTINUE 

STOP 

END 
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t-  P oGP  A N INPUT, OUTPUT  , T AP  £5=  INPUT  , T APE  6=  OUT  PU  T ) 

COMPLEX  ~ U* 43 ) 

uO-iF-E*  AcPHAt  3-)  »PHI(  ?:>,  oH.KH 
xM  Tc.Gc. F I WK ( 12 ) 

REAL  i»CRF  { -5- ) , X(j.c3)  , F (^50) 

DIMENSION  GU3:  ) ,CF  1 1-:  ) 

Oi/ENSION  R<3r),CC3C> 

PI  = U.-ATAN  (1 . C ) 

T WuP 1=  2 . * p I 

NI=*Qa8 

FNI=NI 

. TT  = (TWOPI*l,)‘le.XP  (-TWO PI) 
a A =1.-TT 
00  21  J=1,NI 
FRE0=TW0PI*  (J-D/FNI 

A ( J ) =CH  PL  x <FRE0*EXP  £ -FrtEO  I / U FREQ  *-1  . )♦  EXP  (-FREQ)  -TT)  ,0.) 
LI  CONTINUE 

CAlL  FFT2 ( A,  11, I^K) 

CALL  FFRDF.2(A,11,IWK) 

M = 27 

00  3 J = 1,M 

A ( J ) = A (U)^(-l)  **<J-1>  *TH0PI/FNI 
•»  CONTINUE 
NP=U0 
FNP=NP 

00  13  1=1, MF 

X(i|s-FI*  (*-l) ‘TWCPI/FNP 
NOPMtx  ) = ( X(lH-PI>*cXP  ( -<X(  a)  fPI)  )/aA 
lo  CONTINUE 
H=PI/FMP 

CO  2 K=2,M 

CALL  AUTGREG(A,K,P,NP, ALPHA, PHI, JH,KH,X,F) 

IF  (K«Nc.i7.At\0.K.N2.27)  GO  TO  2 
PRINT" , F 

F(1S=0. 

F ( ? ) =c . 3 5 
G(l»  = u . 

JO  24  1 = 2, NP 

GiI)=GU-i)*rtMF(I-iJ  *F  <I>  > 

24  CONTINUE 
00  25  1=1, NP 
F(I)=F(I>*£X°(-G(I> ) 

25  CONTINUE 


PRINT*,  F 
? CONTINUE 


5T0? 

t'lOj 


PROGRAM  SWtEPlINFUT,QCTPUT,T*Pc5  = INPUT,TAP£6  = CUTPUT) 

COMPLEX  A < 30 » 

ClKFLEX  ALPPA  C30  )»PHI(3L  )»  JH»  Kh 

?EAL  NQRMt  15«)  »X  1150  » »F  ( 15  0J 
DIMENSION  G(13«i)  *CF(  1 20 » 

DIME  fcSICN  R(30),C(3Q> 


PI=4.*ATAN(1.0i) 

TW0PI=2.*PI 

N 1 = 12  9 

FM=Nl 

^FI/FNl 

00  21  J=i»Nl 

FRFQ=TWCPI*<J“i>  / FNI 

& ( JJ  =FREQ*£XP  (-FREQ) 

21  CONTINUE 

c f (i ) =o . 

00  4 1 = 2, NI 

:fci»«cfii-iimg(i-i»»6ci»i*h 

4 Z CKTINUE 
AA=CF(NI) 

00  5 1=1. NI 
CF(I1  = CF(I)/CF(NI> 

5 ( I)  = 2 »*H 

5 CONTINUE 
PRINTS, CF 
XMI0  = Q.5 

X RNG—  1 • 

5CL=TW0PI/XRNG 

CALL  CcNTER  (CF,CF,NltXMID) 

CALL  XSCmLE (CFtCF.NI* SCL) 
print*»cf 

1 = lQ 
L=  1 

CALL  FOURSTI(CF,Nl,R,C,M,L,G) 

PRINT*  ,CF 
OC  3 J=2,M 
L = J-1 

A (J)=CMPlX(R(U,C(LI) 

3 CONTINUE 

A (l)=CrtPLX(  THCPI.O  .) 

NP  = 12  8 
FNF=NP 
PRINT*  »A 
OC  13  1 = 1, NP 

<(I)=-PIMI-1I*TW0PI/FNP 

NORM!  I )=  (X(  ID  ♦FI  >*EXP  (-  IXi  I»*PI>  »^AA 

G < I)  = C. 

13  CONTINUE 

PRINT* .NORM 


_ hi  - 


HI  Au?WE6IA,K.H.«f.“-Prt«.PHi-'H'l<H’Cr,F1 

IF  IK  .LT.8>  GO  TO  2 <H 

^RIT£(6,6»  (aLFHAtX»,I=lj<>^H 
6 FCK«ATl//fHl.**C2F8.4,3<)) 

IF(K.Nc.a.AN0.K.N£.9>  *0  TO  2 

OC  25  1-i.fNF 
F (I)  = l./ (Ft  I)¥TWCPI) 

25  CONTINUE 
PRINT*  *F 
00  24  I=1»NF 
G tI»  = G(I)  + 0.5*f tl) 

24  CONTINUE 


l t 


l 


2 CONTINUE 


STOP 

END 


c 


SudROUTINE  FCURSnt)<.N.CPHI.|Pt|I.HjL,E) 
DIMENSION  XIN),F(N>,CFHI11>.SPHI  HI 
p i=4.*ATANI 1*  0> 

£ l=float  tu 

M L=M*  L 

30  51  J-1*NL 
CPHlt  J)=SPHI(  Ji  =o» 

50  CONTINUE 

IF  (AeSlXt  IH.GT.PI>  SC  TO  104 

XX=XtI)/EL 
C1=C0S(  XXI 
CQ=2.*C1 
C 2-1* 

Sl=SlNtXX) 

CPrm>  = CPHlU>*Ci*FU) 
5PHlll)=SPHl4l)fSl*FtI) 

rpHxiJJ=CPHlU>  *CSREC  ICOtC  1*021 
5PHl(  J )aSPHlt  J ) ♦CSREC  (C0,S1,S2»F(I> 

150  CONTINUE 
IOC  CONTINUE 
iC  TO  5 

lit  F0RN41 1 tHl^ 2X 1 2NH0AIA  NOT  NIXED  FROFERU. 
5 CONTINUE 

return 

END 


SH°OUT£  ME  CENTEfcl  * ,Y,  N.XMIQ) 
^Ii’.ENSION  V(l) 

DIMENSION  X (1  ) 

X1'  T=  X M I D 

Ir  t N »3  T , C J XFT=-XMIQ 
NN=IA9S(N> 

00  9 1=  L » »*N 
9 r (I)=X(I) fXFT 
RE  TURN 
EX  3 


SUBPOUTI^E  XSOALE(X,Y,N,SDL> 
OIMer.SION  Y (1) 

DIMENSION  X (1) 

PTSCl=SCL 

IF(N.LT.O)  PISC  L=1  ./PI  SCL 
NN= I ABS ( N1 
DO  19  1=1,  NN 
1C  Y< I)=X  fll-HlSCL 
°ETl)PN 
END 


CHAPTER  2 


estimating  densities  and  hazard  functions 
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The  kind  of  validation  that  we  have  completed  in  chapter  1 is 
not  sufficient  for  a method  that  is  to  be  used  in  a statistical  context. 
We  have  to  confront  it  with  real  data.  This  testing  can  best  be  done 
by  what  was  called  before  a second-hand  validation. 

Even  though  our  main  interest  remains  with  the  autoregressive 
method , we  will  have  to  take  a longer  look  at  the  competing  techniques 
that  we  mentioned  in  our  introduction,  examining  critically  different 
more  or  less  inaccurate  maps  to  recognize  the  ground  we  are  standing  on. 
We  do  not  define  with  any  more  precision  the  tasks  that  these  methods 
could  be  asked  to  perform.  It  will  suffice  for  the  moment  to  see  how 
they  describe  the  data.  This  is  admittedly  an  incomplete  assessment 
from  which  we  shall  not  try  to  state  definitive  answers. 

This  critical  examination  will  be  done  using  a diversity  of  real 
data  situations:  approximately  normal  data,  approximately  exponential 

data  and  frequency  data.  But  first  we  indicate  how  we  have  used  the 


different  methods. 
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2.1  Choice  of  Input  Parameters 

2.1.1  The  Kernel  Method 


The  k..mel  method  contains  two  input  parameters,  namely  the 

kernel  function  and  the  set  fh.‘}S?  , of  bandwidths. 

1 r J=1 

Wa  have  used  different  kernels:  Parzen  kernel  (Fig.  1,  2,  18,  30), 

Gaussian  kernel  (Fig.  3),  naive  kernel  (Fig.  4).  It  is  clear  that  the 
choice  of  a kernel  is  an  important  one.  The  Parzen  or  Gaussian  kernels 
will  produce  in  general  smoother  estimators.  • 


We  have  always  used  constant  bandwidths  = h , even  though 
there  are  algorithms  of  the  nearest  neighbor  type  to  adapt  the  h^*s  to 
each  data  point,  following  Loftsgaarden  and  Quesenberry  (1965). 

For  the  particular  kernels  we  used,  the  optimal  value  of  h is 


given  by 


h 


opt 


1 f(x)  r K2(y)  dy 


4n[f"(x)  * J y K(y)  dy]" 


1/5 


[f(x)11,/5  j"  KERFAC 

,2/5”  * [ n 


[2f"(x)r 


1/5 


(Parzen  (1962)) 


i 


» 


We  have  approximated  this  by 


K • STDEV  • 


KERFAC 


1/5 


n 
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where  STDEV  is  the  standard  deviation  of  the  sample.  The  proportion- 
ality constant  K , if  it  is  too  large,  yields  very  flat  estimators,  and 
if  too  small,  very  spiky.  A value  around  2/3  has  worked  very  well. 
Note  also  that  STDEV  is  very  sensitive  to  outliers  so  that  adjustments 
may  have  to  be  made. 

2.1.2  The  Quantile  Expansion  Method. 

The  only  parameter  here  is  the  order  of  approximation.  So  we 
use  the  method  iteratively,  until  we  are  satisfied. 

2.1.3  The  Spline  Method. 

We  have  usually  fitted  splines  under  tension  to  the  empirical 
quantile  function.  The  input  parameters  include  the  tension  factor,  the 
set  cf  knots  {x^}._^  and  t*ie  end-point  conditions. 

The  choice  of  the  knot  points  is  very  delicate  and  crucial.  We 
usually  started  with  13  equidistant  knots  between  0 and  1 . Then 
we  moved  them  around  and  reduced  their  number  according  to  the  pictures 
we  were  getting.  This  procedure  is  necessary  when  the  estimated  quan- 
tile function  starts  to  decrease,  contrary  to  the  theory. 

When  the  tension  factor  is  less  than  .001  , the  spline  under 
tension  is  very  much  like  the  cubic  spline;  for  values  larger  than  50  , 
it  is  a polygonal  line. 

The  end-point  conditions  consist  in  this  case  of  estimates  of 
the  first  derivative  of  the  quantile  function  at  both  ends,  usually 


first  differences. 
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2.1.4  The  Weighted  Fourier  Series  Method. 

The  input  parameters  here  are  the  empirical  Fourier  transform, 
the  weights  and  the  order  of  approximation. 

So  again  we  use  the  method  iteratively  and  estimate  the  optimal 
weights  by  using  the  empirical  Fourier  transform  cp^(«)  . Thaler  (1974) 
has  shown  that  for  large  v , cp^(v)  i-s  not  a good  estimate  of  cp(v) 
in  the  sense  that 

lim  Var  cp  (v)  = — 

i m ^ n 

7 j-»® 

But  this  should  not  cause  us  to  worry  as  we  usually  need  only  small  val- 
ues of  v , and  we  can  check  graphically  (Fig.  9)  that  we  are  in  a safe 

2 

part  of  the  domain  by  finding  the  value  at  which  |cp^(v)  | starts 

oscillating  around  1/n  . 

This  procedure  is  simpler  than  Thaler ?s  own  proposal  and  it  also 
performs  better. 

2.1.5  The  Autoregressive  Method. 

The  autoregressive  method  has  the  same  kind  of  parameters  as 
the  weighted  Fourier  series  method,  minus  the  weights- 

The  Fourier  transform  R(»)  is  estimated  by  Rq(*)  from  the 
sample.  Not  only  is  the  method  used  iteratively,  but  there  is  a recur- 
sive algorithm  to  go  from  one  order  to  the  next  (see  appendix  3.A.1). 
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2.2  Buffalo  Snowfall  Data 

T, 

4 

This  set  of  data  has  been  much  studied  in  our  department,  more 
from  the  time  series  point  of  view.  It  consists  of  the  63  yearly  values 
of  snow  precipitation,  recorded  to  the  nearest  tenth  of  an  inch,  from 
1910  to  1972.  It  was  chosen  to  illustrate  the  response  of  the  different 
methods  to  approximately  normal  data  (see  Fig.  Al  and  A2). 


Table  A 
Year  0 

1 

Buffalo  Snowfall  Data 
2 3 4 5 

6 

7 

8 

9 

1910 

126.4 

82.4 

78.1 

51.1 

90.9 

76.2 

104.5 

87.4 

110.5 

25.0 

1920 

69.3 

53.5 

39.8 

63.6 

46.7 

72.9 

79.6 

83.6 

80.7 

60.3 

1930 

79.0 

74.4 

49.6 

54.7 

71.8 

49.1 

103.9 

51.6 

82.4 

83.6 

1940 

77.8 

79.3 

89.6 

85.5 

58.0 

120.7 

110.5 

65.4 

39.9 

40.1 

1950 

88.7 

71.4 

83.0 

55,9 

89.9 

84.8 

105.2 

113.7 

124.7 

114.5 

1960 

115.6 

102.4 

101.4 

89.8 

71.5 

70.9 

98.3 

55.5 

66.1 

78.4 

1970 

120.5 

97.0 

110.0 

( 
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2.2.1  The  Kernel  Method 

We  experiment  first  with  different  values  of  h . Figure  1 
shows  the  effect  of  choosing  h too  large  (h  = 27.75)  in  comparison 
with  Figure  2 (h  = 18.5)  , where  the  kernel  used  is  the  Parzen  kernel. 
Now  we  can  compare  the  Gaussian  kernel  with  the  Par2en  kernel:  with  the 

best  choice  of  h , they  yield  the  same  estimate  (Fig.  2 and  3).  Thus 
the  Parzen  kernel  is  equivalent  to  the  Gaussian  kernel.  The  naive 
kernel  yields  spiky  results  even  with  the  best  choice  of  h (Figure  4). 
However  the  same  basic  shape  can  be  distinguished.  Note  that  this 
method  does  not  impose  any  truncation  on  the  data  so  that  the  tails 
always  go  to  zero. 


2.2.2  The  Quantile  Expansion  Method 

In  Figure  5,  the  peaks  are  much  sharper  and  more  separated  than 
in  the  previous  pictures.  It  is  at  that  order  (order  8)  that  the  three 
modes  appeared  for  the  first  tine. 


- 57- 


2.2.3  The  Spline  Method 

As  in  the  previous  case,  the  quantile  function  is  smoothly 
estimated  and  then  differentiated  to  produce  the  density  estimator. 

The  effect  of  increasing  the  tension  factor  tenfold  is  pictured 
in  Figure  6 (o  * 1.5)  and  Figure  7 (3  * 15.)  . In  Figure  8 we  show 

the  estimated  quantile  function  that  produced  the  estimate  in  Figure  6. 
It  was  based  on  8 knot  points  located  at  0.,  0.06875,  0.25,  0.4, 
0.625,  0.833,  0.916,  1.0  . 

2.2.4  The  Weighted  Fourier  Series  Method 

We  determine  first  the  value  v^  which  is  a kind  of  upper 

bound  above  which  it  would  be  "unsafe"  to  use  the  empirical  characteris 

2 

tic  function  to  estimate  Cp(v)  . Figure  9 is  a plot  of  1°8^q  |cpn(v)| 

2 

vs.  1°8^q  v on  a log-log  scale.  v^  is  such  that  |cpn(v)|  oscil- 
lates around  1/n  for  v > Vq  . Here  n = 63  . Thus  we  draw  a 

horizontal  line  at  1/63  s 0.016  to  get  that  v^  > 9 

We  notice  in  Figure  10  that  the  location  of  the  modes  has  not 

changed.  The  values  of  the  two  extreme  modes  r *e  larger  than  in 

Figure  2 because  of  the  truncation,  the  data  having  been  rescaled  to 
fill  the  interval  [-TT,TTl  completely.  The  truncation  is  responsible 
too  for  the  behavior  at  the  left  tail,  which  was  also  heavy  in  Figure  2 
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?.2.5  The  Autoregressive  Method. 

He  include  several  pictures  to  illustrate  the  different  choices 
we  could  make. 

In  the  first  four  pictures  (Fig.  11,  12,  13  and  14),  the  data 
filled  the  interval  [-TT,tt]  . These  pictures  have  the  same  kind  of 
characteristics  as  Fig.  10  (e.g.,  the  left  tail).  The  number  of  modes 
is  a non-decreasing  function  of  the  order. 

With  the  data  filling  only  the  central  2/3  of  the  interval 
t-TT,lT]  , the  modes  appear  much  more  rapidly  (Fig.  15,  16  and  17).  The 
tails  are  even,  but  the  modes  don’t  stand  in  the  same  relation. 

What  choice  should  we  make? 

Following  one  of  our  conclusions  of  chapter  1,  we  can  look  at  the 
output  parameters. 


i 

\ 
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Table  2.1 

Some  parameters 

of  the  Buffalo  snowfall 

data  density 

autoregressive  < 

estimator  (data  rescaled 

to  [ -TT,rr]  ) 

Order 

Coefficients 

Scale  Factor 

A 

A 

A 

m 

a2m 

K 

m 

1 

(-0.2880, 0.0792) 

0.9107 

2 

(-0.2943,0.0935) 

(0.0075,-0.0516) 

0.9082 

3 

(-0.3064,0.0905) 

(0.0546,-0.1084) 

0.8564 

4 

(-0.3094,0.1081) 

(0.0554,-0.1174) 

0.8517 

5 

(-0.3160,0.1118) 

(0.0487,-0.1422) 

0.8*27 

6 

• 

• 

(-0.3221,0.0979) 

(0.0609,-0.1319) 

0.8242 

• 

9 

(-0.4550,0.1386) 

(0.1570,-0.2222) 

0.6769 

Table  2.2 

Some  parameters  of  the  Buffalo  snowfall 
autoregressive  estimator  - o_.  ■» 

(data  rescaled  to  , -j-  J ) 

data  density 

Order 

Coefficients 

A A. 

Scale  Factor 

A 

m 

'hm 

<*2* 

1 

(-0.5843,0.1321) 

0.6410 

2 

(-0.8359,0.2090) 

(0.3813,-0.2178) 

0.5174 

4 

(-1.1969,0.2016) 

(1.0332,-0.3739) 

0.3537 

8 

(-2. 0489,0. 2873 > 

(2.9781,-0.6858) 

0.1011 

After  order  4 in  Table  2.1,  the  parameter  K decreases  by 

m 

bigger  jumps  and  the  coefficients  ajra  vary  more  widely  from  one  order 
to  the  next. 

In  Table  2.2,  the  behavior  is  the  same  as  in  the  similar  situation 
we  encountered  in  chapter  1 (Table  1.2).  The  data  did  not  fill  enough 
of  the  interval  [-TT,Tf]  . 

Figure  12  seems  to  be  the  best  choice.  To  correct  the  left  tail, 
we  could  contract  the  data  to  907.  of  t-TT,TT]  with  minimal  effect  on  the 


modes . 
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2.3  Maguire  Data 


This  set  of  data  was  studied  by  Maguire,  Pearson  and  Wynn  (1952). 

It  consists  of  "time  intervals  in  days  between  explosions  in  mines 
involving  more  than  ten  men  killed,  from  December  6,  1875  to  May  29,  1951." 
There  are  109  data  points  and  only  93  distinct  values  (see  Table  B and 
Fig.  Bl,  B2) . 

The  authors  concluded:  "none  of  the  tests  described  in  this 

paper  demonstrates  lack  of  homogeneity  in  the  series  of  time  intervals." 
This  set  was  also  studied  by  Boneva,  Kendall  and  Stefanov-  (1971) 

(see  Fig.  B3). 


There  were  several  reasons  to  consider  this  set  of  data,  like  the 


possibility  of  an  exponential  underlying  distribution  and  the  effect  of 


the  range. 


Table  B 


Maguire  Data 


378 

36 

15 

31 

215 

11 

137 

4 

15 

72 

96 

124 

50 

120 

203 

176 

55 

93 

59  t 

315 

59 

61 

1 

13 

20 

189 

345 

81 

286 

114 

108 

188 

233 

28 

22 

61 

78 

99 

326 

275 

54 

217 

113 

32 

23 

151 

361 

312 

354 

58 

275 

78 

17 

1205 

644 

467 

871 

48 

123 

457 

498 

49 

131 

182 

255 

195 

224 

566 

390 

72 

228 

271 

208 

517 

1613 

54 

326 

1312 

348 

745 

217 

120 

275 

20 

66 

291 

4 

369 

338 

336 

19 

329 

330 

312 

171 

145 

75 

364 

37 

19 

156 

47 

129 

1630 

29 

217 

7 

18 

1357 

2.3.1  The  Kernel  Method 


The  extreme  observations  have  so  much  effect  on  the  standard 
deviation  of  the  sample  that  we  can  only  obtain  a very  flat  estimate  of 
the  density  using  the  Parzen  kernel.  But  by  omitting  the  nine  largest 
observations  in  the  computation  of  the  standard  deviation,  we  obtain 
Figure  18.  Despite  the  "bias”  towards  normality  exhibited  by  this 
kernel  (as  noted  in  section  2.2.1),  we  recognize  the  very  short  left- 
hand  tail  so  typical  of  the  exponential  density. 

2.3.2  The  Quantile  Expansion  Method 

It  is  not  possible  to  get  a non-decreasing  estimate  of  the 
quantile  function,  and  this  implies  that  one  cannot  form  its  functional 
inverse  which  is  necessary  to  get  an  estimate  of  the  density. 

When  we  consider  only  the  93  distinct  data  points,  we  get  an 
estimate  (Figure  19)  which  is  obviously  biased,  but  nevertheless  can  be 
useful  in  our  comparisons. 


2.3.3 


The  Spline  Method 


The  spline  method  proves  equally  difficult  to  use  on  the  raw 
data.  Because  of  the  difference  of  concentration  of  the  data  along  the 
real  line  and  the  necessity  to  pack  all  the  data  on  a same  graph,  we 
lose  much  of  our  power  of  resolution.  The  spline  method  is  also  tedious 
to  use  as  there  is  no  systematic  way  of  positioning  the  knots. 

After  many  trials,  we  did  not  produce  any  satisfactory  estimate 
for  mostly  the  same  reason  as  in  the  quantile  expansion  method. 


2.3.4  The  Weighted  Fourier  Series  Method 

In  Figure  20  , we  notice  that  the  estimate  produced  by  the 
weighted  Fourier  series  method  has  many  bumps  and  is  negative  at  many 
points.  But  the  two  major  bumps  are  located  at  about  the  same  place  as 
in  Figure  18  . 
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2.3.5  The  Autoregressive  Method 

First  we  look  at  the  raw  data  (Figure  21).  Notice  the  large 
part  of  the  domain  where  the  estimate  i s essentially  zero:  this  corres- 

ponds to  an  interval  that  contains  only  five  isolated  points.  The  two 
modes  are  at  about  50  and  320  as  in  Figure  18.  The  coefficients  in 
Table  2.3  seem  to  indicate  order  4 or  5 (Fig.  21  or  22). 


Table  2.3  Some  parameters  of  the  Maguire  data  density 
autoregressive  estimator 


Order 

Coefficients 

Scale  Factor 

A 

A 

/v 

m 

^lm 

«2n 

K 

m 

3 

(0.7417,-0.7200) 

(-0.0927,-0.2868) 

0.3120 

4 

(0.7149,-0.7027) 

(-0.0288,-0.3606) 

0.2792’ 

5 

(0.6821,-0.8211) 

(-0.1105,-0.3999) 

0.2391 

6 

(0.6805,-0.9146) 

(-0.2540,-0.4430) 

0.2245 

13 

(1.2903,-2.1401) 

(-1.2134,-3.0118) 

0.0152 

In  the  exponential  case,  we  have  found  that  symmetrization  yields 
good  results.  In  the  real  data  case  all  we  need  to  do  to  symmetrize  is 
to  use  the  real  part  of  the  Fourier  transform  and  evaluate  the  density 
on  [0,rr]  . We  notice  again  that  for  the  same  order  the  symmetrized 
estimate  is  smoother  (Fig.  23  vs  Fig.  22).  This  allows  us  to  consider 
higher  orders  (Fig.  25). 
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Table  2.4  Some  parameters  of  the  Maguire  data  density 
autoregressive  estimator  (symmetrized) 


Order 

Coefficients 

Scale  Factor 

A 

A 

A 

A 

A 

m 

alm 

a2m 

a4m 

a5m 

K 

m 

5 

-.7156 

-.2775 

.0539 

-.0764 

.4251 

6 

-.7188 

-.2753 

.0424 

-.1061 

.4244 

7 

-.7200 

-.2723 

.0303 

-.0982 

.4240 

8 

-.7128 

-.2879 

.0227 

-.2039 

.3973 

9 

-.7826 

-.2456 

-.0340 

-.1976 

.3666 

10 

-.8548 

-.1290 

-.1001 

-.1463 

.3419 

We  notice  that  the  shape  changes  slowly  (Fig.  23,  24,  25  and  26) 
from  one  order  to  the  next  as  the  parameters  remain  very  stable. 

But  from  the  big  change  in  the  parameters  of  orders  8 and  9 , it  would 
seem  that  order  8 is  indicated  (Fig.  25). 

We  then  proceed  to  the  square-root  transformation.  The  auto- 
regressive estimates  are  much  smoother  than  the  one  obtained  from  the 
weighted  Fourier  serici  method  (Fig.  27-29  compared  to  Fig.  20).  There 
are  no  bumps  in  the  right-hand  tail,  which  is  an  improvement  over 
Figures  24-26.  By  looking  at  Table  2.5,  we  can  narrow  our  choice  among 
orders  2 to  5 . The  shape  is  pretty  much  the  same,  but  the  location  of 

( 

! 


the  modes  is  moved  around. 
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Table  2.5  Some  parameters  of  the  Maguire  data  density  autoregressive 
estimator  (square  root  transformation) 


Order 

Coefficients 

Scale  Factor 

m 

alm 

/v 

a2m 

^ > 
B 

1 

(-0.0376,-0.5504) 

0.6956 

2 

(-0.0751,-0.7748) 

(-0.4011,0.0954) 

0.5773 

3 

(-0.0758,-0.8066) 

(-0.4605,0.1050) 

0.5733 

4 

(-0.0784,-0.8073) 

(-0.4644,0.1214) 

0.571  _ 

5 

(-0.0750,-0.8092) 

(-0.4743,0.1206) 

0,5662 

7 

(-0.1020,-0.8337) 

(-0.4989,0.1606) 

0.5361 

Figures  21,  25  and  29  have  their  second  mode  at  about  320  , 
like  the  kernel  estimate  in  Figure  18  and  the  quantile  estimate  of 
Figure  19. 


t 


This  set  of  data  is  taken  from  Bliss  (1967)  Table  7.1  . It  con- 
sists of  a 28-cell  histogram  for  the  lengths  of  survival  in  days  of  1110 
mice  inoculated  uniformly  with  malaria.  This  set  was  also  used  by 
Boneva,  Kendall  and  Stefanov  (1971)  (see  Fig.  Cl). 

VTe  use  it  here  to  compare  the  behavior  of  the  different  methods 
"ith  respect  to  grouped  data,  i.e.,  when  smoothing  a histogram. 

Table  C 

Bliss  Data 


Midpoint 

4.5 

5.5 

6.5 

7.5 

8.5 

9.5 

10.5 

11.5 

12.5 

13.5 

Frequency 

25 

90 

75 

69 

48 

36 

29 

30 

33 

44 

Midpoint 

14.5 

15.5 

16.5 

17.5 

18.5 

19.5 

20.5 

21.5 

22.5 

23.5 

Frequency 

29 

40 

51 

51 

71 

65 

78 

75 

48 

30 

Midpoint 

24.5 

25.5 

26.5 

27.5 

28.5 

29.5 

30.3 

31.5 

Frequency 

35 

17 

15 

13 

4 

6 

2 

1 
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Figure  Cl 
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2.4.1  The  Kernel  Method 

The  kernel  method  produces  once  again  a smooth  estimate  (Fig.  30), 
with  two  modes  at  7 and  20  . The  modes  stand  in  inverse  relation  com- 
pared to  Figure  Cl  . For  the  used  h , there  are  less  points  contributing 
to  the  estimation  around  the  first  mode  than  arouna  the  second.  A smaller 
value  of  h would  give  the  proper  relation,  but  it  would  also  give 
spurious  bumps. 

2.4.2  The  Quantile  Expansion  Method 

The  quantile  method  cannot  be  used  with  grouped  data  unless  one 
unravels  the  histogram  by  distributing  the  frequency  count  of  a cell  over 
its  width  or  by  assigning  to  the  midpoint  a multiplicity  equal  to  the 
frequency  count.  Using  the  latter  we  obtain  Figure  31. 

2.4.3  The  Spline  Method 

In  Figure  32,  there  is  an  artificial  mode  at  4 , outside  the 
observed  range,  created  by  the  initial  conditions  that  have  to  be  imposed. 
The  mode  at  7 is  still  present  though  obscured.  By  changing  the  first 
derivative  at  the  origin  of  the  quantile  function  from  50  to  22  , we 
eliminate  the  artificial  mode  (Fig.  33)  without  effect  on  the  second 


mode. 
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2.4.4  The  Weighted  Fourier  Series  Method 

There  is  no  difficulty  to  handle  grouped  data  in  the  weighted 
Fourier  series  method.  But  there  is  always  the  same  problem  with  the 
tails  (Figure  34). 

2.4.5  The  Autoregressive  Method 

Not  to  include  any  zero  cell  at  either  end  is  equivalent  to  hav- 
ing the  data  fill  the  interval  [-TT,TT]  . From  Table  2.6  and  the  previous 
pictures,  the  estimates  of  order  2 and  3 seem  reasonable  (Fig.  35  and 
36). 

If  we  include  a zero  cell  at  each  end,  the  data  points  are  con- 
tracted to  fill  only  93%  of  [-TT,Tr]  ; the  pictures  differ  very  slightly 
(Fig.  37  and  38),  except  at  the  tails  as  expected. 


Table  2.6  Some  parameters  of  the  Bliss  data  density 
autoregressive  estimator 


Order  Coefficients  Scale  Factor 


A 

A 

A 

m 

alm 

*2® 

K 

m 

1 

(-0.1437,-0.0767) 

0.9734 

2 

(-0.1484,-0.1311) 

(-0.1316,0.3082) 

0.8640 

3 

(-0.2170,-0.1535) 

(-0.1314,0.3508) 

0.8240 

4 

' -0.2482,-0.1578) 

(-0.0832,0.3769) 

0.8064 

5 

(-0.2629,-0.1641) 

(-0.0607,0.3927) 

0.7967 
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2.5  Hazard  Estimation 

The  Maguire  data  set  can  also  be  studied  fiom  the  point  oi  view 
of  hazard  estimation.  It  is  difficult  to  judge  whether  the  estimates 
produced  by  the  autoregressive  method  are  reasonable.  But  as  there  is  a 
one-to-one  relationship  between  the  hazard  function  and  the  density 
function,  we  can  make  the  comparison  in  terms  of  the  density. 

Our  best  estimate  of  the  density  was  Figure  21  which  gives  the 
hazard  function  on  Figure  39,  obtained  by  the  indirect  estimation 
procedure 


h(x) 


f <x) 

tn 

x 

1 - f f (u)  du 


Table  2.7  Some  parameters  of  the  Maguire  data  hazard  function 
autoregressive  estimator 


Order 

Coefficients 

Scale  Factor 

m 

A 

alm 

A 

°2m 

A 

K 

m 

1 

(0.4214,-0.2538) 

3.5558 

2 

(0.4873,-0.3144) 

(0.0511,-0.1748) 

3.4378 

3 

(0.5085,-0.3099) 

(0.1114,-0.2080) 

3.3893 

4 

(0.4864,-0.3551) 

(0.0677,-0.2978) 

2.7822 

5 

(0.4262,-0.4846) 

(-0.0282,-0  3139) 

2.7696 
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A look  at  Table  2.7  would  leave  the  choice  to  be  made  among 
orders  2,  3 and  4 (Fig.  40,  41  and  42).  From  these  hazard  est  lates 
we  can  evaluate  the  density  indirectly  by 

/v  X 

f(x)  = h (x)  exp(-  J h (u)  du) 
m 0 m 

By  this  process,  we  obtain  Figures  43,  44  and  45. 

As  Figure  42  is  closest  to  Figure  39,  so  is  Figure  A 5 to 
Figure  21. 

We  note  that  the  direct  estimation  of  the  hazard  function  is 
difficult  at  the  right-hand  tail  where  it  becomes  infinite.  Also,  in 
the  indirect  processes,  we  have  to  perform  some  numerical  integration 
we  used  the  trapezoidal  rule  which  should  be  adequate  as  long  as  the 
function  we  integrate  does  not  have  too  many  sharp  teeth. 


c 


- 89  - 


2.6  Estimating  the  Density  via  the  Quantile  Function 

There  is  still  another  route  open  to  estimate  the  density 
function:  via  the  quantile  function. 

We  use  the  Fourier  transform  of  the  sample  quantile  function 
QQ(t)  to  get  the  autoregressive  estimate  of  order  m of  the  sparsity 

a A A 

function,  q^t)  . We  integrate  q^(t)  to  get  Q^t)  . 

Now  using  the  relation  given  in  (O.A.1.2),  the  density  estimator 
is 


f(Qm(t)) 


1 

O (t) 
m 


We  illustrate  it  only  on  the  Maguire  data.  Figures  46  to  49. 

The  general  shape  is  well  preserved  (compare  with  Fig.  21-22).  Table  2.8 
lists  some  of  the  parameters.  Orders  2,  3 or  4 seen  likely  candidates. 


Table  2.8 

Some  parameters  of 

the  Maguire  data  sparsity  function 

autoregressive  estimator 

Order 

Coefficients 

Scale  Factor 

A 

A 

A 

m 

alm 

a2m 

1 

(0.6615,0.2498) 

814.9191 

2 

(0.3546,0.3447) 

(-0.4534,-0.0277) 

646.7340 

3 

(0.2663,0.3703) 

(-0.3986,-0.1115) 

620.2509 

4 

(0.2543,0.3741) 

(-0.3808,-0.1301) 

617.8502 

5 

(0.2505,0.3790) 

(-0.3766,-0.1491) 

611.7474 

6 

(0.2254,0.3736) 

(-0.3687,-0.1701) 

571.0632 

rio.4a 


Density  obtained  from 
sparsity  order  2 


IV* 
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2.7  Conclusions 

1.  We  can  produce  with  ease, using  the  autoregressive  method,  estimates 
that  compare  favorably  with  those  of  the  competing  techniques.  We  never 
get  negative  estimates  as  is  possible  with  the  quantile  method,  the 
spline  method  or  the  weighted  Fourier  series  method. 

2.  The  spline  method  can  be  very  tedious  to  use  when  problems  with  the 
knot  points  arise. 

3.  The  kernel  method  performs  very  well,  but  the  snape  of  the  kernel 
has  an  influence  on  the  shape  of  the  estimate. 

4.  In  the  autoregressive  method,  the  data  should  fill  at  least  90%  of 
the  interval  [-TT,rr]  . 

5.  Picking  the  best  order  is  made  easier  by  looking  at  the  output 
parameters.  This  is  another  advantage  over  the  weighted  Fourier  series 

A 

method.  The  quantile  method  also  produces  output  parameters,  the  X^'s  , 
mentioned  in  our  Introduction.  Sillitto  (1969)  gives  interpretation  to 
some  of  them. 

6.  A stricter  rule  to  pick  the  best  order  of  the  autoregressive  estima- 
tor would  be  preferable. 

7.  It  seems  that  transformations  of  the  data  can  improve  the  properties 
of  the  autoregressive  estimators,  notably  in  the  tails. 
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APPENDIX 

2.A.1  Sample  Programs  for  Estimation 

We  include  in  this  appendix  three  sample  programs,  one  for  each 
of  the  approaches  we  used. 

Each  program  is  divided  in  3 parts: 

I - Computing  the  estimated  R(»)  sequence 

II  - Solving  the  Yule-Walker  equations  in  AUTOREG  (see  appendix  to 

Cnapter  3). 

III  - Computing  the  estimate  of  the  density. 

1 program:  Estimating  the  density  of  Maguire  data. 

In  Part  I,  we  perform  the  square-root  transformation  of  the  data 
Y(»)  and  compute  the  estimated  R(»)  sequence  in  GCSPHI  , stored  in 
A(»)  . FREQ ( • ) contains  the  frequency  of  each  data  point. 

In  Part  II,  we  solve  the  Yule-Walker  equations.  F(»)  contains 
the  estimated  density  of  the  square  root. 

In  Part  III,  we  transform  back  to  the  original  scale. 

Subroutine  GCSPHI  is  equivalent  to  FOURSTI  except  that  it 
also  computes  the  square  modulus  of  the  R(*)  sequence  stored  in 
PHI2  . This  feature  is  not  needed  in  the  autoregressive  method. 

2n<*  program:  Estimating  the  hazard  of  Maguire  data. 


c 


In  Part  I,  CF(»)  contains  the  empirical  c.d.f.  and  F(») 


contains  the  first  difference  of  the  estimated  integrated  hatard 

log  (1  - F„C))  ■ 

Part  II  is  as  before. 

In  Part  III,  ve  transform  back  from  the  hazard  to  the  density. 

3rd  program:  Estimating  the  quantile  of  Maguire  data. 

In  Part  I,  FREQ(*)  is  the  frequency  of  each  data  point,  CF(») 
is  the  empirical  distribution  function.  Then,  FREQC)  is  modified  to 
become  the  first  difference  of  the  data  Y(«)  , i-e.,  the  first  differ- 
ence of  the  empirical  quantile  function. 

The  two  other  parts  are  as  before. 

In  these  programs,  we  always  rescale  the  data  to  [-tt.tt]  using 


subroutines  CENTER  and  XSCALE  . 
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r*OG,\ff  V 7 ELSE  ( Ii,FLT  ,OuTcLT  ,TAPE5=INPUT,TAPE6=0UT°UT) 
Oit:rNSiOfj  FP-ntg-?) 

CO  "r  LEX  «(cr>,ALFHAC2'-),l=rtl<£t)(JH,!<H 
JxlENSILN  -(1  1)  ,0  |_:j) 

Ji  MENS  ICN  X(i.i),F(ln£| 

JlNEioTUM  V(i:" 

DIMENSION  Gll'-ii 

ci=a.»atan  a.;  > 
tw^p:  = 2 ,*pi 
M=  7 
M=M-1 
L=i 
NI=Q3 
N = N I 

PEADO.iCC)  (YlI),l=i,N> 

icr  fgkmat  ofi:  ) 

R£AD(5 ♦ 1c 1 ) <FR£Q(I> ,1  = 1, Nl> 

1C1  FORMAT C8FIC.O 

00  4 1=1, NI 
YUI  = SCF.TIY  (I)  ) 

F.\s,Q  ( I ) =FFcG(I»  /129. 

4 CONTINUE 

XKI0=C  .5*  (Y(l)  4-  Y { N ) ) 

X PMG=  Y ( N i - Y CI- 
SC L=ThrPI/XRNG 
CAl.L  C-NTEfi(Y,Y  , N , XM 1 0 ) 

CALL  XSCALtC Y» Y,N,SCL> 

CALL  ORPHIC  Y,K,R,C,PHl,F*tQ,M.U 
A(i>-C'-’PuX(i.,C.) 

CO  10  J = 2,M 

AC  J)=CMFLX(R(J-1)  ,ClJ-ii) 
lc  CONTINUE 
K=  10  3 
FK  = K 
NP  = K 

GO  1 J=2,f* 

00  13  1 = 1, K 

X C I) **° I ♦ ( I* 1) * TWCP I / FK 

13  C0NTI.NUE 

CALl  AUTCREG  (A,  J,M,K,AL.: 

CA^L  XoCAlEC X, X ,-K, SCl) 

GALL  CENTEk(X,X ,-K,XMIC) 

CAuL  XSGALE(F,F ,K,SCL) 

00  24  1=1, MP 
F,i)sC.5*FCI)/XCI» 

X t J. I = X C 1 1 X C Ii 
24  CONTINUE 

1 CONTINUE 


SLiiROUmE  GCS3HI  (X,N,-CH[  FHI?,FP£4,M, 

D I Mo  FSICM  X U)  iCFHI  m,SPHIIl)  ,FH?(1) 
Co'-lFcdX  FH I 2 

ctmnsicn  r-'£u(Nj 
3 I=*.*ATAN(1.U) 
n=flcat<n> 

EI=FlCAT  CL  ) 

*L  = «*  L 

OC  FT  Jsl.rtL 
CFHU  j ) = C. 

SPHfJ)sB* 
sC  £ C NT  INU  E 

DG  lb H I = i » N 

I F (AaSlX  an  .GT.PI1  SC  TO  1G* 

X X =X  (I  )/EL 
C 1=CG3 { XX) 

C0=2.*C1 
C 2 = 1, 

Sl=SlNlXX) 

S2  = C. 

SFhx(l>=CPHl  (D*Cl*FkEG(i) 

3 F hi 4 1) =SPHI ( i) *Sl*F9£  }t  I) 

If  wi-ll  101*  lulf  1G2 
1C  2 : CM  INJE 

DG  *5 a J=2,«L 

: PFH  J)  = CPHI(  Ji  *C$RFGtrG»Cl*C2)*Fit£Qt  I) 
SPMIU)  = SPHX  CJ>*CSREu(C0*Sl,S2)*FREV  I) 
iiP  : CM  I.NJt 
ICil  CQNTIhCE 
100  CONTINUE 

DC  200  J =1  * KL 
C = CPH(  Jl 
S=SPH  (J) 

PHI21  J>=C*CtS*S 
ECO  continue 
iC  TO  5 

lu  h W R ITE  ( o » in  3) 

1C  3 FoR^ATI1H1,2X,24HJaTA  NOT  SCALED  °°DPEFLY) 

5 CONTINUE 
R-  TURN 


ty  uj 


50^0UTUE  CEN  T F"  l * , v , 10) 

"I  ,;E  NS  ION  Ytl) 
nIM5NS  I3N  A (i  * 

X-  T=XMID 

I-(N.jT.O)  xFf=-X1  ID 
*NslA3S(N) 

DO  9 I = L , nt) 
r(I)=X(I)*xFT 
i THPfi 
x3 


SIJOPOOTINE  XS-hLE  ( X,  Y,  N,  SOL  ) 
31  Me.  NS  ION  yill 
DIMENSION  X (1) 
pT  stL  = SC  L 

irCN.LT.3)  PISCL  = 1 ./PISCL 

MN=IABS( Ni 

DO  10  1=1,  NM 

t ( I)=X  (H*PISCL 

3£TUPN 

* '13 
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-- OG-.'.  h T 6. t,S  = l IN^L  T ,uo  t PUT  , T fl  = £9  = INPUT  , TAPE6=0UT»UT  ) 
JIv£r<SIG>,:  FP£P  ( 93  ) 

'"n.‘Fu"X  A i 2D  , ALPHA  i*>  , ) ,rn  I U 1 ) , JH,KH 
J I y~NSI  C*"'  r fi~  > ,C  (1r) 

?1M£NSIC»!  A(l.Z)  tF(lCG) 

^pension  v ( i c 3 i 
TI^.-SICN  CFCi.v»,G(iC*l 

°I=L.*AT feK  (i.r  ) 

T*C°I=2 , -PI 
^ = 5 
L=i 
MI  = 9? 

N=Nl 

«seo(5, ; j wm,i  = i,M) 
ir  FORMAT  (8F!  '•.'•) 

READ(5,irl)  (FREQQ) ,1  = 1, Nil 
Iwi  FORMAT (8Ft:,  j ) 

CF 11) =FREC<1) /113. 

F (lJ  = -flLnG  <1 . -CF(1)  ) 

OG  8 1=2, HI 

CF  (I)  =CF  (i-li<-FREC(I)/UC  . 

F (a)=-AuOG  (l.-GF(I)  )<-A  uCG(  1 * -CF  tl-D  ) 
n oGNTINuE 

XKI3=. ,5"  (Y(l)fY(N)  ) 

X-* G=  V ( N ) - Y ( 1 ) 

SC  L=TwOP I / XRNG 
C«LL  CENTEPtY, Y,N,XMIC) 
wmLL  aSCxLc.  <Y  , Y ,N,SCt) 

CALL  GCSCHI£Y, N,c,rtpHI,F, P,L) 

A ( 1)  =CK  PuX  {-ALC6(  l.-CF  (NI»  )-F(l)  ,?.) 
cn  i?  j=2,m 
« i j)  = cp.plx  (R ( j-i) 

i-  rchTiMUE 
K = 1CC 
FK=K 

FNF=NC 


->L 


HsPI/FPO 
DO  1 J=  2 , ** 
rjO  13  1=1,  K 

«ThCPI/FK 

CONTINUE 

CALL  AUTCRCG  CA, J,rt,K, ALPHA  ,PrtI , JH,KH ,X ,F > 
G ( 1)  =F  ( 1 ) * W 
OC  cu  1 = 2, NP 

G(II  = otx-j,)*F*  (F  ll-l)  tF  (I)  ) 

CONTINUE 
uC  2E  1 = 1, »P 

F ( I)  =F  { I)  »:)io  (-G(  I)  » 

CONTINUE 
r 0 )7  INUS 
CN  9 
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“VJSRAN  TENSE  (lNPbT,Cl;iPUT  »T  AP  c 5 = T^U  T ,TAPE6=0UTPUT  » 
COMPLEX  A12C) , ALFrtA(?r ) , Fn 1 (2u ) .JH,<H 
DINENSTCK  ROC)  ,C  UO) 

3 1 NE  NS1 CU  X <10  21  , F (1G2) 

0 1 ME  13 1 CN  r t 200)  ,FR£Qt2CC) 

OIrtcNSICU  CF  t 130  ),  r,(lCD.) 


P I=t.*ATAN  (1.C-) 

r w^pi=2 . *pi 

i-7 
L = 1 
N 1 = 93 
N = N I 

READ(5,xC0)  <Y(IJ,I=1,N> 
loC  FCFMAT { 3F1j. 0) 

R E AO  {5 » 1 Cl ) iFREQ  (I)  , 1 = 1, N I) 

1C1  F CkHAT (SFlu  «0 ) 

:f  (D=FRcG(i)/iia. 

- R£Qll)=r(l) 

30  e 1=2, NI 

CF  ( I)  =CF  (I-1J  fFR£G  ( I ) /txG« 

FRcQ(I)  = Y<lJ-Y  (I-U 
8 CONTINUE 

X*IC=  t C F C M > +CF  (1 ) ) *0»  5 
X RNb=CF(N)-GFll> 

5Ci.=  TW0PI/XfiNG 

CALL  CENTE<v(Cr,CF,M,X^I3) 

CA»-L  XSCALE(CF,CF,NI, SCL) 

CALL  uCSPhl  <CF,NI,fi,C  ,FhI,  pKEG,N  ,L  > 
DC  15  J = 2 » M 

A l J)=CMFLX (R( J-ll , CCJ-l) ) 

15  CONTINUE 

a ll)  = CMFLX  (Y  (NI  -Y(l>  ,0.  ) 

F R IN  T*  , A 
< = 130 
FK=K 
>1  F=K 
FNf=NF 


30  1 J=2,M 
3C  13  1 = 1, K 

X (I)=-FI» (I-ll *TW0PI/FK 
1*  CCNTINUc 

CALL  rtUTCtiEGlA,  ALFmA  , PHI , J l,KH , X ,F| 

dkINT“,  ALPHA  ,KH 
X ( li  = Y (1» 

*(1)  = 1./F(1) 

3 U 14  1 = 2, NF 

X ( I)  = X(I-1)  ♦F(I)*TW0PI/  FN  F 
r (I)  =l./FCIi 
1>*  C Cl  T INUE 


1 CONTINUE 


E NO 


- - 


i 


PART  II  - THEORETICAL  RESULTS 
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Before  establishing  the  practicability  of  the  autoregressive  method, 
we  had  briefly  mentioned  in  the  Introduction  two  possible  interpretations 
(namely  an  orthogonal  polynomial  interpretation  and  an  autoregressive 
covariance  modeling)  to  give  some  Insight  as  to  why  this  method  could  be 
used  at  all  in  the  estimation  of  certain  functions.  We  will  now  develop 
these  interpretations  as  they  really  open  the  way  to  the  understanding 
of  the  theoretical  properties  of  the  method. 

Typically  the  statistical  evaluation  of  an  estimation  procedure  is 
the  study  of  its  convergence  properties.  This  can  be  done  quite  often 
in  two  parts:  first  the  deterministic  part  or  study  of  the  bias, 

second  the  stochastic  part. 

Paralleling  Part  1,  we  will  first  answer  questions  about  the  bias, 
that  is:  How  good  is  the  autoregressive  method  as  an  approximation 

method?  Then  we  will  consider  the  consistency  problem  30  as  to  complete 
the  picture  of  the  autoregressive  method  as  an  estimation  method. 

In  the  process  we  will  try  to  resolve  the  unanswered  questions  with 
which  we  concluded  Part  I: 

— What  is  the  best  route  to  estimate  a function? 

— Should  we  transform  the  data? 

— In  the  case  of  a density  function,  should  we  proceed 
directly  or  via  the  quantile  function,  or  via  the 
hazard  function? 

— How  does  averaging  successive  orders  help  to  reduce  the  bias? 

— - Can  we  explain  the  "odd-even”  phenomenon? 

— Why  does  symmetrizing  work  better  in  near  exponential  situations? 
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j * Note  on  our  numbering  system 

-«c'|  t 

Let  a stand  for  a chapter, 

b for  a section  of  a chapter, 

x for  a subsection  within  a section, 

y for  the  rank  of  an  equatior.  in  a subsection. 

All  equations  will  be  numbered  (x-y)  . 

For  reference  purposes,  we  will  use  (x*y)  when  the  reference 
appears  within  the  same  section  (a*b)  as  the  equation.  Otherwise,  we 
will  use  the  complete  identification  (a*b*x*y). 


CHAPTER  3 


INTERPRETATIONS 


3.1 


Time  series  interpretation 


3.1.1  Moving  average  process 

Let  Tt  be  the  set  of  all  integers.  Let  {Y(t),te2:}  be  a 
comp lex- valued  stationary  time  series  with  covariance  function 
By(v)  = E[Y(t)  . Y(t  + v)]  . 

We  say  that  Y(»)  is  a moving  average  process  if  there  exists 
an  orthogonal  process  {e(t),t€i}  with 


E[C(t)]  = 0 


R (v)  = E[e(t)  . e(t  + v)]  = cr  6 _ , a2  > 0 

€ t v,u  £ 


(where  6^  ^ is  the  Kronecker  delta  function) 


such  that 

00 

(1.1)  Y( t)  « £ 8ke(t  - k)  . 

k= 

Define  the  lag  operator  L by 
LJ€(t)  = e(t  - j) 

then,  (1.1)  can  be  rewritten  as 

► 

Y(t)  - h(L)  e(t) 

d-2) 

where  h(L)  * E 

k*  -oo 


I OH  - 


In  other  words  we  say  that  Y(t)  is  the  output  of  a filter  h(L) 
with  input  €(t)  . Moreover,  we  have  that 


(1-3) 


Thus, 


RyCv)  = E R^'s)  R (s  * v) 


where  B^(s)  - S 0kS,+k 
k=  -a 


d-4) 


Vv)  * S 6s0s+v 

S=  -oo 


In  view  of  the  convolution  formula  (1.3),  it  is  useful  to  consider 
in  turn  the  Fourier  transform  of  Ry(*) , f^(»)  , defined  by 


(1-5) 


1 — i vx 

fy(x)  ~ ^ E e Ry(v) 

v= 


We  can  write  it  automatically  as 


fY(x)  = fh(x)  . fg(x) 


(1-6) 


where  f^OO  ~ E e iVXRjj(v)  ~ jh(eiX)| 

Vs  •« 


Vx)  - b = e'ivxVv)  * m 

V*  -00 


Thus, 


- I 65“- 


(1.7) 


fY(x) 


ge  , . , ix. . 2 
2TT  !h<e  >1 


We  also  have  (if  Fy(*)  is  absolutely  continuous) 


(1.8) 


Vv)  = J*  eiVXdFy(x)  = J eivxf  (x)  dx  . 

-TT  -TT 


TT 

-TT 


Because  of  (1.8),  we  call  f^C*)  the  spectral  density  of  the 
ix  00  ikx 

process  Y(«)  ; h(e  ) ~ E e Is  called  the  transfer  function 

k8®  “oo 

of  the  filter  h(L)  , as  it  is  the  link  between  the  time  domain  represen- 
tation (1.2)  and  the  frequency  domain  representation  (1.7). 

It  is  time  that  we  worry  a little  about  the  meaning  of  all  those 
infinite  operations  we  have  been  performing. 

The  process  Y(*)  has  finite  variance  if  and  only  if 
“ 2 

E jgjJ  < 00  > and  then  (1.1)  is  defined  in  mean  square.  If 


E |ek|  < ® , then 

k*=  -oo 


s ivv)|  <f  i |skl| 

V®  -CD  J_k=  -00  j 


< « 


and  (1.5)  will  converge  pointwise,  a.e. 


oo  oo  2 

E |0k|  < - Implies  B |y  < - , 

k=  -oo  k*  -« 


it  turns  out  that  E jgk|  < 00  is  a sufficient  condition  for  all  our 

k=  -oo 

operations  to  be  valid. 


3®3t-‘a  • 
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We  will  say  that  Y(»)  has  a moving  average  representation  in 
terras  of  the  past  if  = 0 for  all  k < 0 in  (1.1).  We  will  say 

that  Y(»)  has  a moving  average  representation  of  order  q if  in  addi- 
tion 3,  *=  0 for  k > q and  g 0 in  (1.1);  usually  we  normalize  by 
k q 

gg  = 1 and  use  the  notation 

q 

(1.9)  Y(t)  = e(t)  + S 3kne<t  - k> 

k=l  R 


( 


3.1.2  Autoregressive  process 

Let  (Y(t) ,tel}  be  a comp lex- valued  stationary  time  series  with 
covariance  function  R,^(  •)  . 

We  say  that  Y(.)  is  an  autoregressive  process  if  there  exists  a 
filter  g(L)  and  an  orthogonal  process  {”n(t),t6Z} 

ElTJ(t)]  = 0,  R^(v)  = E[Tl(t)  11(t  + v)]  - O^6v>0  , 0^  > 0 


such  that 


(2.1)  Y(t)  = ^~Tl(t) 


where 


g(L)  - L a Lj 
j*0  J 


a = 1 
0 


1 


-lo  7- 


* 


We  usually  write  the  equivalent  formula 


(2.2)  Y(t)  + 2 a,  Y(t  - j)  = M(t) 

j=l  3 

Note  that  the  filter  g(L)  does  not  allow  the  Yi»)  process  to  depend 
on  its  future,  but  it  is  conceivable  that  h(L)  = l/g(L)  would  allow  the 
future  of  the  "n(»)  process  to  enter  in  (2.1).  We  will  guard  against 
that  by  asking  that  the  roots  of  g(z),  (r”1}  ? be  a1*-  outside  the  unit 

circle.  Then, 


(2.3) 


h(L) 


1 = 1 
g(L) 

tt  (1-r  L) 
3-1  J 


k\ 

E (r.L)k 

k=0  3 


which  provides  a moving  average  representation  in  terms  of  the  past. 


Tt  is  difficult  to  express  R^(»)  in  terms  of  the  filter  h(L) 
as  in  (1.4),  but  if  we  post -multiply  both  sides  of  equation  (2.2)  by 
Y(t  - k)  , k = 1,2,3,...  , and  take  expectation  of  both  sides,  we  obtain 
the  following  linear  relations 


Ry(0)  R^l)  Ry(2)  • • • 

r ■ 
°i 

y-D 

RyC-1)  1^(0)  1^(1)  . . . 

tt2 

=5  — 

y-2) 

y-2)  ^(-1)  yo)  . . . 
• 

°3 

• 

\<-3) 

• 

• 

• 

which  are  called  the  Yule-Walker  equations.  When  we  post-multiply  by 


Y(t)  and  take  expectation  we  obtain 


C 
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* 

K 


(2.5) 


E a.  RyCj) 

j=0  J 


2 

°T) 


In  establishing  these  foruulas  we  make  use  of  the  fact  that 
Y(»)  has  a moving  average  representation  in  terms  of  the  past,  which 
implies  that 


(2.6) 


E[Tl(t)  Y(t  - k)]  = 0 

< 

Epl(t)  Y(t)]  * c* 


for  k > 0 , all  t 


The  matrix  on  the  left  of  (2.4)  is  an  infinite  Toeplitz 

matrix. 

Finally,  as  in  the  previous  section,  the  spectral  density  is 


(2.7) 


fy(x) 


2 

aT| 

2TT 


. . ix. , 2 

|g(e  )| 


We  say  that  Y(«)  is  an  autoregressive  process  of  cider  p if 

a.  = 0 for  j > p and  a 5^  0 in  (2.2).  In  this  case  we  prefer  to  use 
J P 

the  following  notation: 


P 

(2.8)  Y (t)  + 2 a.  Y(t  - j)  = r|(t) 

j=l  JP 


The  coefficients  now  satisfy  a finite  system  of  Yule-Walker  equations 


c 


(2.9) 


Ry(0) 

• . . Ry(P  * 1) 

alo 

• 

RyM) 

Ry(l-P) 

. . . Ry(0) 

a 

__PP_ 

Ry(-P) 

More  precisely,  the  covariance  function 
ing  difference  relation 


RyCO 


obeys  the  follow- 


(2.10) 


P 

E 

j“l 


Jp 


Ry(j  - V)  = -^(-V) 


for  all  v > 0 . 


It  is  clear  that  in  the  finite  order  case  there  is  no  problem  in 
any  of  the  operations  except  maybe  in  (2.9).  But  there,  if  Y(»)  is 
stationary,  the  covariance  function  Ry(»)  is  strictly  positive  definite 
and  thus  the  determinant  of  the  finite  Toeplitz  matrix  in  (2.9)  is  greater 
than  zero  (see  Fagano  (1973)  for  the  real  rase).  The  infinite  order  case 
Is  treated  as  the  limit  of  the  finite  order  cases. 

In  the  following,  we  shall  impose  that 
that  fy^(*)  is  integrable  as 


< 


to  insure 


Ri(0) 


-22  a + 1 1*  I2) 
^ j-1  j 


Pn  dx 

-TT  V*> 


where  we  anticipate  (3.2)  and  Table  3.1. 


-no  - 


3.1.3  Relations  between  moving  average  and  autoregressive  processes 
If  we  compare  (2.7)  with  (1.7),  we  sea  that  the  spectral  densi- 
ties of  autoregressive  and  moving  average  processes  are  almost  inverse 
of  each  other.  Let 


(3.1) 


£ar<x) 


ixN . 2 


fMA<X>  - V !h<*  > 


2TT  ix.  . 2 

|g(e  )\ 


Then  f^(»)  is  the  spectral  density  of  a moving  average  process  having 

22-2  -1 

filter  g(L)  and  with  = 4TT  ^ . Similarly,  fMA(«)  is  the  spectral 

density  of  an  autoregressive  process  having  filter  h(L)  and  with 

2 ._2  -2 
cu  = 4h  a 


To  any  spectral  density  we  associate  a new  fimction  fi(»)  * f ^(*)  , 
the  inv  rae  spectral  density  (we  require  f ^(.)  to  be  integrable,  so  it 
can  play  the  role  of  a spectral  density),  and  Ri(»)  the  covinverse  func- 
tion (or  inverse  covariance  function)  related  to  fi(»)  by 


(3.2)  Ri(v)  - J*  eivxfi(x)  dx 

-TT 


-III- 


( 


3.1.4  General  representation  of  a time  series 

Let  {Y(t),t£2}  be  a complex-valued  stationary  time  series 
with  spectral  density  f(»)  and  covariance  function  R(*)  , 


TT 


R(v)  “ J elvxf(x)  dx 
-TT 


Then  there  exists  an  orthogonal  process  {e(t),tel}  such  that 


(4.1) 


Y(t)  =-  S g c(t  - k) 
k*=  -“ 


where  S |0kj  < ® , 

k=  -® 


and  f(,)  can  be  represented  as 


(4.2) 


® ilex  2 

I S 8keikX| 

~ 2 k=  -ao 

f(x)  = ag  2tt » where  f(x)  < ® 


If  log  f(»)  is  Lebesgue-integrable  (which  requires  at  least 
that  f(*)  be  positive  and  finite  almost  everywhere  on  [-TT,tt]  ),  we 
have  that  = 0 for  all  k < 0 , i.e.  Y(«)  has  a moving  average 
representation  in  terms  of  the  past  and  f(#)  is  representable  by 


(4.3) 


00  iw  2 

1 S 0ke  | 

~ 2 k=n  K 

f(x)  = a ~ , where  0 £ f(x)  < ® 

0 * 


(Doob  (1953),  p.  577) 
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On  the  other  hand,  there  exists  a process  Y'(.)  having  covari 
ance  function  Ri(*)  and  spectral  density  fi(*)  = f *(•)  with 


jr 


Ri(v)  = J e1VX  fi(x)  dx 

-TT 


Thus  there  exists  an  orthogonal  process  {e'(t),te2.}  such  that 


f’ 


(4.4) 


f Y'(t)  = E a.  e'(t  - j) 
j*  “00  J 


where  E jo..]  < 

j®  -CO  1 


and  fi(»)  can  be  represented  as 


(4.5) 


I S a e 

?i00  = o’, 


ijx  2 


, where  fi(x)  < oo 


As  log  f i ( • ) 9 -log  f(*)  , we  have  that  = 0 for  all 
j < 0 if  log  f(«)  is  Lebesgue-integrable,  and  then  fi(»)  is 
representable  by 


(4.6) 


iS  v1*,1 

- o 


, where  0 £ fi(x)  < ® 


By  comparing  (4.3)  with  (4.6),  we  conclude  that  if  log  f(#) 
is  Lebesgue-integrable,  f(.)  can  be  represented  almost  everywhere  by 
either  of  two  forms:  — as  r moving  average  spectral  density 


-I  IH- 


(4.7) 


S Bk  e 


f(x)  = a 


2 k=0 


ikx^  2 


2TT 


s w 

k=0  K 


— as  an  autoregressive  spectral  density 


(4.8) 


?(x) 


4TT  

2 

ae/  2TT 


. “ iix.^ 

s V “ 

j«0  J 


E |a  I < - 

j=0  J 


3.1.5  Time  series  interpretation  of  the  autoregressive  method 
We  start  with  a function  R(*)  that  is 


(5.1)  strictly  positive  definite 
and  such  that 

(5.2)  R(-v)  = R ly) 

It  is  well-known  that  such  a function  is  a covariance  function. 

Then  we  assume  that  there  exists  a function  f(»)  defined  on 
(.-TT.TT]  such  that 

R(v)  * J eivx  f(x)  dx 

-TT 


( 


-Ilf- 


Fur  thermore,  we  assume  the  existence  of  a hypothetical  complex- 
valued stationary  time  series  Y(.)  whose  covariance  function  is  R(») 
and  spectral  density  f(»).  It  is  always  possible  to  construct  a 
Gaussian  time  series  having  zero  mean  and  covariance  function  determined 
by  R(.)  . 

Assuming  that 


(5.3) 


TT 

- CB  < J*  log  f(x)  dx  < ao 
“TT 


.TT 


and  J* 


-IT 


dx 

f(x) 


< 03 


we  seek  the  autoregressive  representation  of  f(») 


Aw 

f(-) 


(5.4) 


f(x) 


< 


2 

2TT 


| E a e 

j*0  J 


ijx. 


2 

£ | a . | < eo 

j=0  J 


where  O' 

ao 


V 


£ a.  R( j) 

j=0  J 


, according  to  (2.5) 


Now  we  know  from  subsection  3.1.2  that  the  a's  and  the  func- 
tion R(»)  are  related  via  the  Yule-Walker  equations  (2.4).  But  chat 
system  is  infinite;  so  we  consider  successive  approximation  by  finite 
orders  as  in  (2.9) 


fp(x) 


K 

_R 

2tt 


(5.5) 


|1  + £ a eijx|' 
' j»l  jP  ' 


S *0) 


j=*0 


jp 


% 


1 


3.2  Orthogonal  polynomial  interpretation 

3.2.1  Theory  of  orthogonal  polynomials  on  the  unit  circle 

Let  F(«)  be  a nondecreasing  bounded  function  with  infinitely 

2 

many  points  of  increase,  defined  on  . W*.  denote  by  the 

space  of  measurable  complex- valued  functions  g(»)  such  that 


rP  ix  2 

j |g(e1X)p  dF(x)  < - . 

-TT 


It  is  well  known  that 


with  the  following  inner  product 


J u(eiX)  v(eiX)  dF(x) 
-TT 


2 

u(.)  and  v(.)  eL 

F 


is  a Hilbert  space. 

2 2 n 

If  we  orthogonalize  in  L the  set  of  powers  {l,Z,Z  ,...,Z 

r 

we  obtain  a set  of  polynomials  (cPg(»)  ,Cp^(*)  ,cp2(») , . . . jCp^.)  , . . .}  that 
are  uniquely  determined  by  the  conditions  that 

n 

(1.1)  cp  (Z)  = S a . Zn"J  , a > 0 , for  all  n 

n j=0  jn  On 

(1.2)  ^(O^C^F  ° 6jk  ’ for  311  J k • 

In  order  to  construct  the  polynomial  £n(*)  , we  define  the 


characteristic  sequence  R(»)  by 
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I 


,-r - 


\ 


( 


(1.6) 


R(0)  R(l)  ...  R(n-l) 

• • 

a 

nn 

R(“', 

• 

• • 

• • 

if 

: 

R(-  n + 2)  R(-  n + 3)  . . . R(l) 

a2n 

R(2) 

R(-n  + l)  R(-  n + 2)  . . . R(9) 

* 

aln 

R(l) 

- 1 ___ 

- 

On  JT.  n « 

inx  ■ - ,*  i(n-j)  x.  - 


p , inx  , _ * 

J iC  + P.  ajn 6 


-TT 


dF(x) 


Thus  Cp  (Z)  =*  an  (Zn  + a*  Zn_1  + . . . + a*  ) , aft  >0 
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Consider  now  the  subspace,  of  1^,  , generated  by 
,Cpx(.)  , — *C?T1<*>)  and  denoted  by  L*  . 

2 

Ln  is  a reproducing  kernel  Hilbert  space,  that  is  there  exists 
a function  Kn(»,«)  of  two  complex  variables  such  that 


Kn(.,y)  CL* 

(1.7) 

Ka(.,y)  - ^ Vy)  9j(#) 

g(y)  , for  all  g(.)  e I** 

We  can  obtain  an  explicit  representation  for  K^C* ,y)  by  solving  the 
following  normal  equations: 


(1.8) 


J-V 


|g(*),Kn(.,y)j 


i 
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(1-9)  (cpj(.),Kn(.,y)]F  = o.(y)  , j-0,l,...,n 

to  yield  that 

n 

(1.10)  ^(..y)  = E <P.(y)  cp.(.) 

j=0  2 2 

At  the  same  time,  we  have  verified  the  reproducing  property  (1.8) 

2 

because  any  g(«)  6 LQ  can  be  written  as 

n 

8(0  = E g.Cp  (.) 

j=0  2 2 

It  is  clear  also  that  ^(..y)  is  a polynomial  of  degree  n 
(for  fixed  y ).  So  we  may  seek  its  equivalent  polynomial  representa- 
tion 


(1.11)  ^(Z.y)  = E a (y)  Zj  . • 

j=0 

Let  hj (Z)  = Z2  , j = 0,1, ...,n  . Then,  by  (1.8), 

(V*)’Kn(«,y>)F  = ^(y)  = , j “ 0,1, . . . ,n 

This  can  be  rewritten  as 


( 


- m- 


(1.12) 


2 R0  * i)  ■ y » j = 

-6=0 


n 


2 a.  (y)  R(-6)  = 1 . 

-6=0  'C'n 


In  particular  if  we  set  y = 0 , we  get  the  following  system 


(1.13) 


2 a (0)  R(j  “ -6)  = 0 , 

-6=0 


j = l,...,n 


2 atn(0)  R(-  -6)  « 1 
-6=0 


This  system  is  equivalent  to  (1.4)  with  the  followirg  identification: 


(1.13) 

a-4) 

j 

c > 

n-  j 

°u<°> 

< > 

a0n  * a-6n 

Thus  while 

V2>  - jlv 

Zn  ^ , we  have  that 

(1.14) 

1Cn(Z>0)  ’ a0u 

" j 

S a.  ZJ  , 

j=0  Jn 

(1.15) 

Kn<°-°>  - a0n 

“ 2 |cp- (0)  l 2 

j=0  J 

n 

- 2)  \* 
j=0 

jjl 


Let  Pn(Z)  = Kn(Z,0)/a0n  , then 


C 


(1.16) 


(<(.),<(.))F 


Kn(0,0) 


On 


= 1 


3.2.2  Some  interesting  properties 


A - Extremal  properties 

The  polynomials  cp^(Z)  and  Kq(Z,0)  are  also  related  through 

a minimum  norm  approximation  problem.  Suppose  we  want  to  find  the  best 

2 n-1  * • 

approximation  to  Z in  the  space  Ln  ^ , 23  c.  Z^  , such  that 


1 .JO 

S O*  ZJ(|| 
j=>0  J F 


J*  |einX-  23  c*  ei^X| 2 dF(x)  is  a minimum. 
-TT  j-0  J 


n 2 

We  know  that  our  answer  will  be.  the  projection  of  £ on  L , 

n-I 

n-l 

Let  g*(Z)  ® Zn  - 23  c*  Z^  . We  certainly  have  that 

. n i=°  J 1 

g (Z)  = 23  p.p.(Z)  , and  moreover  we  know  that  Bn  = , for  the 

j=0  J J ‘ a0r. 

coefficient  of  Z is  equal  to  1 . It  follows  that 


Hg*(0l!p  - S |Bj|2  * |Sa|2  - -f  . 

J*°  a0n 

tpn(2)  HJr(Z) 

But  attains  this  lower  bound.  Thus,  satisfies  all  the 

a0n  a0n 

rC 

requirements  for  g (Z)  and  is  our  answer.  So 


.TT 


If  J I 


inx 


n-l 


* 2 
'j 

be  said  of  the  equivalent  form 


e - 23  c . e‘‘J"|  dF(x) 

-TT  j=0 


is  a minimum,  the  same  can 


TT  . 9 

f*  , inx,  “ . -inx 

J |e  Me 

-TT 


— 
E 

j=0 


c. 

J 


dF(x) 
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or 


iTT  tl*  X ■ • / • \ o 

1-  s c*el  n ] *1  iF(r) 

-Tt'  j-0  J 


I 


Sow, 


n-i_  n a K (Z,0) 

1-  £ c*z*-J  - 1+  E ^2j  - ^0 > 

j-0  2 j=l  S0n  V°’0) 


n V2)  n 2 

In  other  words,  Z - — — is  the  projection  of  Z on  LT  ^ 


and  1 


Kq(Z,0) 

Kn(0,0) 


On 


is  the  projection  of  1 on  the  subspace  of  i/ 

n 


generated  by  {Z,...,Zn]  ; Yj0~0) 

xx  2 

Z and  its  projection  on  LT_^ 


is  the  squared  distance  between 


i 


i. 


( 


B - Recurrence  relations 

Let  H be  a Hilbert  space  with  inner  product  denoted  by 
(f,g)H  » for  f,g  in  H . We  denote  by  f(»|g)  the  projection  of  f 
on  the  subspace  of  H generated  by  g . By  analogy  with  regression 
theory,  we  extend  the  notion  of  partial  correlation  coefficient  to  the 
context  of  a general  Hilbert  space:  for  any  elements  f ,g,f^, . . . ,fQ 

in  H , the  generalized  partial  correlation  coefficient  between  f and 
g , given  fj,...,fQ  is  defined  as 


(2.1) 


f " f(»|fi»***  »fn) *g  “ g(%| fi» • • • »fn)) h 

:(f’8lfi»***’*n)  3 Ilf  - f (.| f v . . . ,fn)iiH . iig  - g(.| fr 7 . . 7y nH 


where  !lh|L 

tl 


(h,h)„  is  the  norm  of 
H 


in  H . 


-Ul- 


Then,  for  any  f , g and  h in  H , we  have  that 

(2.2)  f(.|g,h)  = f(.|h)  + f(.|g  - g(.jh)) 

To  prove  this,  it  is  sufficient  to  note  that  h and  g - g(*|h)  gen- 
erate the  same  subspace  as  g and  h , as  they  form  a basis  for  that 
subspace. 

We  make  two  applications  of  the  identity  (2.2)  to  the  case  of 
2 

our  Hilbert  space  Lp  - We  represent  7T  by  fQ  , for  n = 0,1,2,... 

The  first  application  is 

(2.3)  fn(*l£0>***»fr :-i)  = fn^*lfl’' * ' ,fn-l^ +fn^*lf0  ‘ f0^  f **‘*»fn-l^ 
which  translates  into 


(2.4) 


Cp  (Z)  Zcp  (Z)  K .(Z,0) 

n _ _n  n-1  n-x _ _ 0 

“ “ - ^ tr  /A  f\\  > ® 1*2,, 


On 


-a 


wnere  a ! 
of  (2.4). 


nn 


On 


is  found  by  equating  the  constant  term  on  both  sides 


The  second  is 


(2.5)  f0(*jfi,...,fn)  3 foM  fl»  * • * »fn-l^  +f0^*lfn" 


( 


which  translates  into 


-ur- 


(2.6) 


Kn(Z,0) 

Kn(0,0) 


K .(Z,0)  ZCP  (Z) 

. n- 1 . _ n- 1 - - 

' Kn-l'°'°>  aO,n-X  ’ ' 


where  0 = - — — is  found  by  equating  the  cc  • ficient  of  Z°  on  both 
a0n 

sides  of  (2.6). 


Thus, 


(2.7) 


*n<Z) 


ZVl<Z>  . V Kn-l(Z>0) 

S0,n-1  a0n  W°’°> 


(2.8) 


Kn(Z,0) 

Kn(0,0) 


K ,(Z,0)  a Z CD  .(Z) 
n-lv  * nn  n-1  ' 

K -(0,0)  a.  an  - 
n-lv  ’ On  0,n-l 


Moreover,  we  can  give  the  following  interpretation  to  the 


coefficient 


(2.9) 


‘ 'f0’fl^H/ ^fl’fl^H  ’ n * 1 


■r(fQ, fQ] f^, . . . , fn_i)  > n * 2,3, . , 


We  use  the  relation  (2.5)  in  the  modified  form 


(2.10)  fg(«jfi»..*»fn)  “ foMfi>-**’fn-l^  + ^[fn"  fnM  fl»  * * ‘ ,fn-l^] 


together  with  the  basic  property  of  the  projection 


(2.11)  (VVh  = (toHfl’*'  Vh 


i 


- U&- 


to  obtain  that 


(2.12) 


(f0~  ^0^*  1 f 1’ ' ' ‘ * fn-l^  ’ Vh 

(fn'  fn^*  I flf  * * * * fn-l^  * Vh 


The  second  element  f in  each  inner  product  can  be  replaced  by 
f - fn(«| f^, . . . ,fQ  ^)  and  finally  it  can  be  verified  directly  that 


"f0  ” 


>fn-l>" 


H 


thus  completing  the  proof  of  (2.9). 

l appendix  3.A.1,  we  use  (2.8)  and  (2.12)  to  obtain  a recur 
sive  algorithm  for  the  computation  of  Kn(Z,0) 

C - Asymptotic  properties 
Theorem  3.1: 


, tt  K (0,0) 

1 f n ’ 7 ikx  , p 

2TT  J £ T e dx  - J 

-TT  ]Kn(eix,0)j 


TT 


eikxdF(x)  , k - 0,±1, 


-TT 


Proof:  See  Geronimus  (1961),  p.  12. 


In  what  sense  i^  this  result  useful  to  us? 

We  know  from  real  analyst"  ' hat  any  monotone  function  F(») 
be  written  as  a sum 


F(.)  - Fac(.)  + Ffi(.) 


can 
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where  — Fac(»)  *s  a^so^ute^y  continuous  with  respect  to  Lebesgue 

measure 

— F (•)  is  the  sum  of  a step-function  with  a singular  function 
s 

Let  f(*)  be  the  derivative  of  the  absolutely  continuous  part  of  F(*)  . 
In  general.  Theorem  1 means  that  F(x)  and 

y°.°> 

j =-75 2 d9  have  the  same  first  (p  + 1)  elements  in  their 

-TT  |Kp(ei9,0)J 

characteristic  sequence.  In  the  case  F(*)  is  absolutely  continuous, 
the  Fourier  series  of  the  difference 

Kp(0,0) 

K)(e1*,0)|2 

is  of  the  form 


( 


E 

lkl>p 


-ikx 

e 


Now,  what  happens  in  the  limit  as  p -»  ® ? We  have  the  follow- 
ing theorems  that  we  take  again  from  Geronimus  (1961),  except 
that  the  notation  is  changed. 

Theorem  3.2: 


0 < lirn  K (0,0)  - E |CD  <0>| 2 = K(0,0)<® 

P-*co  F j=0  j 


if  and  only  if  log  f(*)  is  Lebesgue-integrable,  that  is  if  and  only 

if  J log  f(x)  dx  > -® 

-TT 


< 


- t 2.8"- 


Proof:  See  Geronimus  (1961),  p.  14-17. 

Theorem  3.3 : 

K (2,0)  1 - 

TT(Z)  = Urn  -*====  = lira  cp*(Z)  -firfirf,  S <P,(0)  Cp  (Z>  , |Zt  < 1 
p-*a>  /Kp(0,0)  p->®  P V^u»u'j=0  J J 1 » 

if  and  only  if  log  f(»)  is  Lebesgue-integrable.  The  convergence  is 
uniform  in  any  closed  region  |Zj  £ r < 1 

Proof:  See  Geronimus  (1961),  Chapter  II. 

Theorem  3.4: 


Also, 


lim 


rrCe1*)  r-»l"  TT(relx) 


exists  almost  everywhere 
in  [-TT.TT]  . 


f(x) 


21T  , . ix.  . 2 * 

)| 


a.e.,  in  (-TT,rr]  . 


Finally,  let  E * {xe[-rr,rr]  , 0 < f(x)  < ®}  and  define 


ixN 


Vx) 


rr(e  ) , x e E 

0 , xi  E 


Then,  TTg(x)  has  the  following  expansion  in  terms  of  the  orthogonal 


- 1 zi  - 


CO 

polynomials  {c?n(*)}n=0 


ix. 


TT  (x) — S c?  (0)  co  (e‘") 

e Vk(o,o)  -ig  vr  ; r ; ' 


which  converges  in  L^, 

Proof:  See  Geronimus  (1961),  Chapter  II. 


Theorem  3.5: 
Let 


then. 


2 


and  lim  6=0 

p 

p-*» 

QO 

Proof:  The  expansion  of  TTgCO  in  terms  of  {Cp^*)}^^  is* 

Theorem  3.4. 


a ’j(0)  ’>j<eiX) 


The  expansion  of  CPpCO  is 


•* 


from 
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cp*(eiX)  = “ • 2 9,(0)  cp  (eix) 

P Op  j=0  3 3 


and 


/K  <°,°>  * ix  i P ix 

/k<o7o>  V /K(0,0) jto  3 3 


as 


% - /y°-°> 


Now  by  Plancherel’s  theorem 


I liTj.  < • ) 


^^^!(eiX)IL='b=;fs  CP4(0)  «p,(eix) 


^K(0,0) 


V F 


VK(°.°)^ 


j=0 


jv  / 


P 

£ 

j=0 


CPjCO)  cp.(eix)jl 


So, 


V°>1 


And,  by  Theorem  3.2 

lim  6=0 
p 

P -*  CO 


Theorem  3.6: 

If  F(«)  is  absolutely  continuous  in  f-TT,TT]  , 

0 <mS  f(x)  £ m , a.e. , in  [-tt,tt]  and  f(.)  e Lip(l/2,2)  , then 

c: 


-131  - 


|cpn<z)i  * c 


|Z\  * 1 


Proof:  See  Geronimus  (1961),  Chapter  III,  Theorem  3.8. 


Note : 


f(*)  e Lip  (a,p)  if 


^\l/p 


wp(5‘.f) 


sup 

|h;<6 


LT  |f(x  + h)  - f(x)jpdx 


-tt 


0(6a) 


We  get  the  same  result  as  Theorem  3.6  if 


TT 


R(v)  - J eivxf(x)  dx  = 0(v  *) 

-TT 


Theorem  3.7: 


-TT 


If  1°8  F'(x)  dx  > - oo  , 6p  =*  o^— ) 

and  F(x2)  - F(x^)  ^ m(x2  - x^)  , for  m > 0 and  -TT  ^ x^  < x2  £ tt  , 

then  lira  Cp*(e1X)  = TT(e1X)  , a.e.,  in  [-TT,TT]  . 
p-»«o  p 

Proof:  See  Geronimus  (1961),  Chapter  V,  Theorem  5.1  . 


( 


In  the  case  F(»)  is  absolutely  continuous  and  f(»)  is 
bounded  above  and  below,  it  is  sufficient  to  have  w2(6;f)  * o(fb)  . If 
w2(5;f)  * 0(oa)  , tt  > % , then  the  convergence  is  uniform  (Geronimus 
(1961),  Theorem  5.2). 


We  can  reformulate  the  assumptions  of  some  of  these  theorems 
in  terras  of  the  sequence  of  partial  correlation  coefficients  {a  } by 
noting  the  following  results: 

Theorem  3.8: 

The  condition  ja^j  < 1 , for  n - 1,2,...,  determines  the 

CO 

entire  set  of  orthogonal  polynomials  {Cpn(*)]n_Q  UP  to  a mu^t:*-Plicat:i-ve 
constant  (cpg(*)  * 1)  and  thus  determines  a function  F(»)  , bounded 
nondecreasing  with  infinitely  many  points  of  Increase. 

Proof:  See  Geronimus  (1961),  Chapter  VIII,  Theorem  8.1. 


Theorem  3.9: 

CD 

2 

log  f (•)  is  Lebesgue-integrable  if  and  only  if  £ ja  | < ® . 

n*l 

Proof:  See  Geronimus  (1961),  Chapter  VIII,  Theorem  8.2. 


Theorem  3.10: 

3 

If  la  I < — : , for  n large  enough,  we  have  that  at 

I nnl  n log  n 

all  x where  f(x)  > 0 


■.  • *,  ix.  . ix. 

lim  cp  (e  ) = TT(e  ) . 

P-»od  p 


Proof:  See  Geronimus  (1961),  Chapter  VIII,  Theorem  8.4. 
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* 


¥ 


Theorem  3.11: 

If  la  | < 1 for  all  n , then. 
I nni  ’ 


iyz)i s aop  ■ 


Proof:  See  Geronimus  (1961),  Chapter  VIII,  Theorem  8.3. 


There  does  not  seem  to  be  any  condition  on  the  fa  } that 

nn 

would  yield  the  equivalent  of  Theorem  3.6.  Indeed,  we  have  the  follow- 
ing: 

Theorem  3.12: 

CO 

If  E la.  J < ® , then 

. i nn  i 
n=i 

|tpp(Z)|  ^ C , for  |Zj  £ 1 , and  all  p 
F(*)  is  absolutely  continuous  in'  [-tt.tt]  , 
f(x)  s m > 0 , 

f ( • ) is  continuous  and 

CD 

|CP*(Z)  - TT(z)j  * C'  . s |a..|  , |Z|  s;  l 

v j-p 

Proof:  The  first  assertion  is  a direct  consequence  of  Theorem  3.11. 

For  the  other  assertions,  see  Geronimus  (1961),  Chapter  VIII,  Theorem  8.5. 


( 


and 


So,  at  the  same  time  as  the  boundedness  of 
£**(.)“  ^ , we  get  the  convergence  of  C?^(e*X)  to  TT(e'*'X)  . This  is 

equivalent  to  the  combination  of  Theorems  3.6  and  3.7  with  uniform 


convergence. 
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3.2.3  Orthogonal  polynomial  interpretation  of  the  autoregressive 
method 

If  we  want  to  approximate  the  derivative  f(.)  of  a bounded 
nondecreasing  function  F(»)  , we  form  the  characteristic  sequence 

TT  . 

R(v)  = j e1VXdF(x)  , v = 0,1,..., n 

-TT 

from  which  we  obtain  the  orthogonal  polynomials  {cpj(»)  ,cp^(«) , . . . ,Cpn(*)} 
and  the  related  kernel  functions  {K^» ,0) ,IC^(»,0) ,. . . ,Kn(* ,0)} 

Under  the  assumption  that  log  f(.)  is  Lebesgue-integrable,  we 
have  that 


fp(x) 


1 Kp(0»Q) 

2TT  |Kp(eiX,0)  | 2 


is  an  approximator  of  f(x)  . 


In  the  next  chapter,  we  give  more  precision  to  that 
affirmation. 


The  estimation  problem  would  be  similar  save  for  the  estimation 


of  the  R(«)  sequence  usually  through  the  use  of  a crude  estimator  of 
F(.)  . 
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3.3  Correspondences  between  the  two  interpretations 


The  autoregressive  approximator  of  order  p 


(0.1) 


f^Oc) 


l + £aj»* 


and  the  approximator  of  order  p we  obtain  from  the  orthogonal  poly- 
nomial theory 


(0.2) 


fj2)(x) 


V_I’0) . 1 

| K (eiX,0) j 2 


Kp(0,0) 

2tt 


|aopj5,ajpe  i 


are  equal. 


Indeed  in  the  time  series  case. 


Kp  are  related  by  the  following  equations 


(0.3) 


S a,  i(j  ■ l)  = -R(-t)  , 


AaJPR<3>  = "p 


(see  equation  3. 1.2. 9) 


In  the  orthogonal  polynomial  case,  we  can  rewrite  the  system 


(3.2.1.13)  as 
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(0.4) 


j E a,  R(j  - D « -a  R(-t) 
j«l  JP  p 


an  E a.  R(j)  = 1 

OPj^  JP 


The  following  identifications  provide  the  equivalence  of  the 


two  systems: 


(0.5) 


K = 
P 


Kp(0,0) 


It  follows  that  all  the  properties  that  were  established  in 
subsection  3.2.2  have  a time  series  interpretation.  We  note  first  that 
from  the  definition  of  R(*)  in  both  interpretations,  we  can  make  the 
identification  of  Y(t  - v)  with  ZV  . Then,  from  subsection  3.2.2 
part  A,  the  best  (minimum  mean-square  error)  linear  predictor  of  Y(t  - p) 
given  Y(t  - p + l),...,Y(t)  , denoted  by  Y(t  - pjt  - p + l,...,t)  is 


(0.6) 


Y(t  - pjt  - p l,...,t)  - - E a Y(t  - p + j)  , 

j-1  JP 


and  the  best  linear  predictor  of  Y(t)  given  Y(t  - 1),...  Y(t  - p)  is 
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(0.7) 


Y(tjt  - i,...,t  •>  p)  = - s^ajp  Y(t  - j) 


The  recurrence  relations  of  subsection  3.2.2  part  B become 


(0.8) 


Y(t-p|t-p  + l,...,t)  = Y(t-p|t-p  + l,...,t- 1) 


and 


(0.9) 


Y(t|  t - 1, . . . ,t  - p)  - Y(tjt- l,...,t-p  + l) 


P-1 


app^aj,p-l  Y(t'P+j) 


(-a  ) is  the  partial  correlation  coefficient  between  Y 

PP 

and  Y(t  - p)  given  Y(t  - 1, . . . ,Y(t  - p + 1)  , and  by  evaluating 
(3.2.2.12),  we  obtain  that 


P-1 


P-1 


(0.10) 


- a 


Sa  R(j-P)  £ a R(j-p) 

1=0  3,p  1 1=0  3>p  1 


PP  P-1 

£aj.p-iR(j) 


p-i 


Kp  is  the  mean- square  prediction  error  when  use 
Y(t j t - 1. . . . ,t  - p)  to  predict  Y(t) 

Finally,  the  theorems  of  subsection  3.2.2  part  C apply  directly. 


} 
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In  particular,  Theorem  3.1  can  be  rephrased  to  say 


(0.11) 


f f(i)(x)  eikXdx  « /V^dFCx)  , \k\*  p 

-rr  p _TT 


and  Theorem  3.2  implies  that 


(0.12) 


K decreases  to  K 
P 


1 

K(0,0) 


as 


P *>  CO 


( 


\ 
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Appendix 


3.A.1  Recursive  algorithm 


We  provide  a recursive  algorithm  to  compute 


Vx)  ’ 2n  * 


|l  + s a e 
1 j-1  jp 


ijx.2 


given  , {a^  and  the  sequence  R(*) 


rrom  (3.3.0.10),  which  is  the  equivalent  of  (3.2.2.12),  we  have 


(1-1) 


a * - L ,R(j-  p)/K  , 

PP  jto  i»P_1  P-1 


From  (3. 3. 0.9),  which  is  the  equivalent  of  (3. 2. 2. 8),  we  have 


(1.2) 


a,  = aj  i + & a , , , 1 =»  l,...,p-l 

jp  j,p-l  pp  p-j,p-l  * j » 


Finally,  from  (3  .-1.5. 5),  we  have  that 


(1.3) 


kp  ' £°3pr(j) 


°0p  • 1 


Using  (1.2),  we  obtain  that 


i 
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K = K , + a s a . . R(j) 

P P-1  pp  P-J.P-1 


K 


P-l 


K 


p-l 


The  Initial  conditions  are  simply 


(1.4)  Kq  * R(0) 


00 


We  have  written  a FORTRAN  subroutine  that  computes  fp(“)  * 
This  subroutine  is  called  AUTOREG  and  can  be  found  on  the  next  page. 


c 
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SLirOJ  T I NL  hUTwiREu  ( k * \ ,M,NP*Ai-PHh,PhI,  JM  , K H » X , F) 

C 

: „OMPuTES  THE  -O'FFICIENTS  AlPHA  (.),  ACCORDING  T3 

C A 3ECJr\SiVE  A.GORIThM,  AND  Th-  CoRPESPuNDING 

C FUNCTION  Ft.)  AT  THE  i-oInTS  X(.) 

: INPUT- 

k = VtCTjP  uF  COMPLEX  FOURIER  TRANSFORM, 

C OF  DIMENSION  hT  LEAST  M 

C \ : (K-l)  IS  THE  ACTUAL  ORDER  OF  HE  SCHEME 

c B^InG  COMPUTED*  K.GE.2 

: M : (M~i>  IS  THE  MAXIMUM  QkDER  OF  SCHEME 

C Tu  3E  COMPUTED 

C X = X ECT 3°  0r  VALUES  AT  HrtlCH  Ff.l  Io  TO 

C BE  COMPUTED*  OF  DIMENSION  NP?  -PI.lt*  X.-E.PI  j 

C 3CFUT-  l 

C ALPHA  = SECTOR  OF  COEFFICIENTS,  DEFINING  THE 

Z A PPR3  X I MA  T iNu  FUNCTIuN*HAS  TO  BE  DlilEN- 

C SICnEO  AT  LEAST  M 

z F = V&CT3R  OF  VALUES  uF  THE  AFP ROXi MAT! NG 

C FUNCTION,  OF  CIMENSION  NP 

C A-HHA  * 3 HI  * JH,  <M  ARE  USED  RECURSIVELY,  THAT  IS 

THEIR  VALUES  AT  OUTPUT  FOR  K ARE  USED  AS  INPUT 
C FOR  KU) 

C 

DIMENSION  X ( 1 ) , Ft  1 ) • 

CGHPLS  X A ( j.  ) » A L3HA  ( 1 ) , FHl  (1) , JH, KH, G 
C 

T J OF  1-6.  2ool85o0 
JH=CMPLX  tO  . ,C. 1 
ALPHAt  1)  =CMPLX  Cl.,  j . ) 

L=<-1 

PHT (c)=CMPLX(l.,0.)  1 

IF ( K.t  Q.c  ) KH=  CON J G t A l 1) ) 

03  4 1 = 1, L 1 

, JH=  JH*-CON  JG  (Ati  *1)  *PrtI  CI)>  | 

G= - JH/ K H . I 

ALFrtA  CM  =G  | 

IF(L.cQ.l)  GO  TO  .5  I 

DO  2 1 = 2, L i 

2 ALPHAt  I)  = AlFHA  II(  + G*FHI(I*1)  j 

? C ONT INU  E | 

00  o 1=1, K j 

3 PHI  ( II  = CON  JG' ALPHA  tK  + l-I) ) J 

KH=  K.H- J M ‘CCNJG  (JH)  / CON  Jl»  C < H > j 

CL  = KH/T  WDF I 

oo  li  1=1, np  ; 

G=CMPLX(1.  ,0.)  ! 

03  12  J=2,K 

fj= j-i  ! 

U 5=G+r£XP  IC.IFLX  tc>.,  x CI> ‘FJM ‘ALPHA  <J> 

F<  I»  = CC/  IG-CUNJGIC)  I 

11  CONTINUE  | 

nETURN 
END 


The  properties  of  the  autoregressive  approximator  f^(.)  depend 
in  a large  measure  on  the  function  f(»)  being  approximated.  In  this 
chapter,  we  study  the  effect  various  hypotheses  concerning  f(*)  have 
on  the  behavior  of  the  bias  function 

b (•)  « f(.)  - f (.) 

P P 

especially  the  rate  at  which  it  goes  to  zero. 

In  this  chapter,  we  follow  Geronimus  (1961)  closely,  except  in 


Section  4.2. 


-lis- 


■f 


4.1  Autoregressive  representation 

4.1.1  Convergence  "in  the  mean" 

We  start  our  study  with  the  case  where  F(*)  is  absolutely  con- 
tinuous and  where  its  derivative  f(.)  has  the  following  properties 


(1.1) 


TT 

’ 0 < J f(x)  dx  = R(0)  < « 

-TT 

- ° < J*  d*  - Ri(o>  < «. 

JT 

-<n  < J log  £(x)  dx  < log  R(0)  < » 
-TT 


Under  these  conditions,  f(.)  has  the  following  autoregressive 
representation 


too  - 3s  • 


03  Hx  2 

ii  + s n 
1 i=i 3 


a > 0 


£ ia.l  < » 

j=l  J 


and 


fp(x) 


K 

JL 

2TT 


* T*  • 


X 

2rr 


ix. ,2 

IV*  >1 


Also,  £<*>  - ^ • )itCelx)(2 


a.e.  (theorem  3.4) 


f 
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Theorem  4.1: 


Under  the  condition  (1.1),  we  have  that 


(1.2) 


lim 

P -*  00 


1 1 
f(x)  ■ fp(x) 


f(x)  = 0 


and 


(1.3) 


lim 

p-*oo 


|f(x)  - fp(*)l 
fp(x) 


dx 


0 


Proof: 

Note  first  that 


1 1 

f(x)  - fp(x) 

f(x)  fp(x) 

f(x)  . fp(x) 

So  (1.2)  implies  (1.3). 
Now, 


1 1 
£<x)  ' fp(x) 


£ 


2tr 


2TT 


|7T(e1X)|2  - iP*(e1X)|2J 

|rr(eix)|  + |0*(eix)|J  • ^(e1*)  - cp*(eix) 
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rr  i 

r 1 


I y. 

i-r  . i(x)  dx  s 2TT  J |TT(eix)(  + |pV*>|< 


f (x)  * v J M ' i p ' I 

P -TT  l J 


. |TT(eiX)  - cp*(eix)  J . f(x)  dx 


Using  Schwarz'  inequality,  we  obtain 


f | • **  - 2"  + Kc,l3t)llrj 


. llTT(eiX)  - C0*(eiX)iL 


f n . ^ 

Recall  that  lln(.)l!  - [J  |H(eiX)  | 2 . f(x)  dx 

V-TT  ) 


We  have  shown  previously  that 


!icp*(eiX)IL  = 1 

P F 


(3.2.1.16) 


and  by  Theorem  3.4 


lbr(e  x)ll. 


T/<m 


o)  - l , 


lhf(eiX)  - cp*(eix)!lF  .> 

p *♦  00 


which  completes  the  proof. 
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\ 


Note  that  (1.2)  and  all  other  expressions  involving  can  be 

rewritten  using  the  special  notation  introduced  in  subsection  3.1.3, 
e.g.  for  (1.2) 


tt  lfi(x)  - fi  (x) I 
lim  J dx  = 0 


p-»eo  -TT 


fi(x) 


Thus  the  statements  about  f(*)  are  not  exactly  of  the  same  form  as  the 
statements  about  fi(«) 


For  our  second  step  we  add  some  conditions  to  insure  that  the 

00 

orthogonal  polynomials  {cp  (.)}  _ are  uniformly  bounded  and  so  also 

n n=u 

the  • 


Recall  that  f(«)  e Lip  (a, 2)  if 

f TT  2 

w0(6;f)  " sup  jj  | f (x  + h)  - f(x)|  dx 


(h|s;6  j -tt 


o(6a)  . 


(1.4) 


condition  (1.1) 


< 0 < m £ f (x)  s;  m < o , a.e.  in  [-Tr,TT] 


f(.)  e Lip  (1/2,2) 


Theorem  4.2: 


Under  condition  (1.4), 


( 


.TT 


(1.5) 


lim  J 
p ->  oo  _TT 


1 

f(x) 


fp(x) 


dx  = 0 


- 


1 


(1.6) 


lim 

p-»® 


J 


-TT 


f(x)  - fp(x) 
fp(x) 


dx 


0 


Proof : 


Note  first  that 


1 1 
f (x)  " f (x) 


!f(X)  - fp(x)|  ^ L jf(x)  - fp(x)| 

"fU)  • f (x)  * M f (X) 
p p 


and  so  (1-5)  implies  (1.6). 


Again 


1 

f(x) 


fp(x) 


s 2n 


^|n(eix) 


. ix.  *,  ix. 
n(e  ) - Cpp(e  ) 


But  under  condition  (1.4), 


lTT(e^X)l  £ , f(«)  being  bounded  below 

and 

]cpo(e^X)|  £ C , by  Theorem  3.6 


( 


So 


tt 


m • J* 


-TT 


1 1 

2 

dx  £ 

1 

1 

f(x)  fp(x) 

j 

-TT 

f(x) 

' y*> 

•f(x)  dx 


e'  • t 


v 

-TT 


> ixN  *,  ix. 
TT(e  ) - cpp(e  ) 


f (x)  dx 
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which  goes  to  zero  according  to  Theorem  3.4. 
This  completes  the  proof. 


A slight  modification  of  condition  (1.4)  will  provide  us  with 
pointwise  convergence. 

4.1.2  Pointwise  convergence 


(2.1) 


condition  (1.1) 

0 < m £ f (x)  ^ M < to 

f(.)  - 8(0 
w,(&;g)  - o(6a)  , 


a.e.  in  [-TT,TT] 

a.e.  in  [-TT,TT] 

|<a  * 1 


Theorem  4.3 : 


Under  condition  (2.1), 


_TT 


(2.2) 


and 


Proof: 


lim  J*  If (x)  - f (x)|  dx  » 0 

p-*a>  -rr  ^ 


lim  f„(x)  * ■—  * ~-^2  , uniformly 


P ->  08 


|TT(e1X)| 


(f(*)  • 17--2  » *-e*> 

^ |TT(eix)|2 


Under  condition  (2.1),  we  certainly  have  (1.4);  but  by  Lemma  5 
of  Ibragimov  (1964),  {JcpR(*) j 3™^  is  uniformly  boarded  from  above  and 
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below,  i.e.  0<b£  ]cp_i(ei3C)  | * B < =>  . Thus 


0<ai  f^(x)  ^ A < eo  , for  all  x and  all  p 


and  (2.2)  follows  from  (1.6). 

To  prove  (2.3),  we  note  that  Theorem  3.7  can  "be  applied  so  that 


lim  ^(e1*)  - TT(eix)  , uniformly 

P-»«.  F 


-CO 

and  by  the  previous  remark  f £ | cpp  ( • ) \ ip*Q'  is  uniformly  bounded  from  above 
and  below) 

J_  . . 1 _1_  1 

2rr  nlim  . *.  ix,.2  " 2n  • . .ix..2  » ^ifomly  . . 
P*-»«  l^p^e  )|  ln^e  '[ 

This  completes  the  proof. 

At  what  rate  does  the  bias  decrease  to  zero? 


(2.4) 


condition  (1.1) 

0<nSf()SH<B 

a.e. 

in 

[~TT,rr] 

f(*)  a g(0  » 

a.e. 

in 

g(*)  has  r derivatives 

- 

g(r)(.)  e Lip  (a, 2)  , 

0 < a 

- 1 
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Theorem  4.4: 

Under  condition  (2.1), 


|b  <x)|  = 0(p“P) 


Under  condition  (2.4), 


|bp(x)|  = 0(p"P) 


|3  < a - 1/2 


P < r +a  - 1/2  . 


Proof : 

This  is  essentially  Theorem  3.12  in  Kromer  (1969). 


4.1.3  Properties  based  on  the  partial  correlation  coefficients 

As  in  Section  3.2.2,  Part  C,  we  can  use  the  partial  correlation 
coefficients  to  describe  the  properties  of  the  bias  function.  We  obtain 
results  that  are  similar  to  those  of  the  two  previous  sections. 


Theorem  4.5: 


00 

E 

n=l 

let  I2 
1 nnl 

< 

CD  9 

lim 

« 

p -■>  ao 

r" 

1 

j 

-TT 

f(x) 

• f(x)  dx 


C 


and 
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lim 
P“>  S' 

Proof:  By  Theorem  3. 

Lebesgue-integrable. 


TT 


I 


-TT 


f(x)  - f (x) 


fp(x) 


dx  = 0 


9^  E jtt  | <oo  implies  that 
n=l 


So  we  can  apply  Theorem  4.1. 


log  f(*) 


is 


Theorem  4.6: 

I£  lannl  < n Tog-;  > for  " Utge  cn°u*h" 


lim  f (x)  = f(x)  , at  all  x such  that  0 < f(x)  . 

p-*00  P 


Proof:  By  Theorem  3.10,  we  have  that 


= TT(e*X)  , at  all  x where  f(x)  > 0 


1 ' = 1 
, *,  ix.,2  . , ix.  .2 

IVe  }l  ln(e  }l 

We  can  get  now  different  estimates  of  the  bias. 

Theorem  4.7 : 

00 

If  E |a  | < « , then,  F(*)  is  absolutely  continuous 

n*=l 


lim  cp*(e1X) 

P“»00  P 

and  so 


lim 

P -*  CD 


c 


f(x)  ^ m > 0 


> 


1 

f(x) 


fp(x) 


£ C 


S l°Wcl 

k*p 


Proof: 

The  first  two  assertions  follow  from  Theorem.  3.12.  Then,  es  in 
Theorem  4.2, 


1 

f(x) 


fpjx) 


S 2tt 


|Tt(elx)  - q£ceix)| 


* C 


S l°kki 

k=p 


by  Theorem  3.12. 


If  we  add  to  the  hypothesis  of  Theorem  4.7  that:  f(»)  is  bounded 

from  above,  then  we  can  prove  that  jf(x)  - f (x)|  goes  to  zero  uniformly 

09  P 

at  the  same  rate  as  E jo^  j . 

'k=T)  • r - - ...  . . 
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4.2  Fourier  analysis 

The  Fourier  series  representation  of  f(*)  is 


f(x)  ~ ■—  E R(v)  e”1VX 


and  the  Fourier  series  representation  of  f^C*) 


fp(x)  ~ i S RP(V)  e-ivX  , 

V*  -os 


where 


yv> 


R(v>  , 

|v|  * p 

v < -p 

Rp(-v)  , 

V > p 

Now  it  is  always  true  that  the  Fourier  series  representation  of  f^(.) 
converges  pointwise  to  fp(»)  for  almost  all  x in  [-TT,TT]  . This 
follows  from  the  fact  that  | Rp< v) | decreases  exponentially  (see 
Appendix  4.A.1). 

00  2 

If  21  |R(v)  | < as  , then  f(*)  is  square- integr able  and 

V=  -so 


TT 


lim  J |f(x)  - f (x)|t  dx 

p-»OD  -TT  p 


3 o . 
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1 I £ |R(v)|  < co  , the  Fourier  series  of  f(*)  converges 

V=  “CD 

pointwise  to  f(«)  and  we  can  write  the  bias  function  exactly  as 


1 

bp(x) 

II 

£1' 

• 

£ (R(v)  - R (v)) 
|v|>p  p 


-ivx 

e 


a.e. 


We  obtain  the  following  bound 


bp(x) 


And  so  for  almost  every  x in  [-Tr>TT]  , the  rate  of  fall- off  of  b^fx) 
to  zero  is  the  slowest  of  the  rates  of  convergence  of  S|R(v)|  and 


S|Rp(v)|  . 


» 
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Appendix 

4.A.J  Rate  of  fall-off  of  an  autoregressive  covariance  function. 

Suppose  (X(t) , t e 2}  is  a stationary  autoregressive  process  of 
order  p , i.e.,  there  exists  an  orthogonal  process  {e(t),teZ}  such 
that 


P 

x(t)  + £ a.  x(t-  j)  = e(t)  , 
j-i  JP 


then 


Y(t)  = ^X(t),X(t-l),...,X(t-P+l))T 
is  a Markov  process  and  as  such  its  covariance  function  is  of  the  form 
RyCv)  - Ry(G)  • [PY(1)]V  . 


Now 


^(v) 


R(v) 

R(v  + p - 1) 


R(v  - p + 1) 
R(v) 


where  R(v)  = E[X(t)  X(t+v)] 


and 


-1*8 


o 

0 

1 
0 


The  characteristic  polynomial  of  PY(1)  is  simP^y 

$<X)  = Xp  + E a Xp‘j 
j“l  JP 

which  has  also  been  referred  to  as  the  indicial  polynomial  of  the  con- 
stants {l,a,  , . . . ,CX  } . Pagano  (1972)  has  shown  that  the  stationarity 

J*r  Hr 

of  X(.)  implies  that  all  the  zeros  of  $(•)  are  strictly  within  the 
unit  circle. 

Thus  all  the  eigenvalues  of  Py(l)  have  modulus  less  than  1 . 

By  using  the  Jordan  canonical  decomposition  of  P^(l)  , it  i3 
seen  that 


'IP 


1 0 


PY(1)  - 


a2p  oio 


a , 0 0 

P-1,P 


PP 


0 0 


lio  [PY(1)J 
v-»® 


pxp 


and  by  the  same  token  that  R(v)  goes  to  zero  exponentially. 
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CHAPTER  5 
CONSISTENCY  STUDY 

The  Autoregressive  Method  as  an  Estimation  Method 


l 


In  this  chapter,  we  want  to  establish  the  consistency  of  the 
autoregressive  method  and  find  the  rate  of  consistency  '.n  terms  of  a 
relation  between  the  order  of  the  estimator  and  the  sample  size. 
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5.1  Convergence  of  R(«)  to  R(») 

# The  autoregressive  estimator  of  order  p depends  on  the  data 

A 

through  the  R(»)  subsequence  and  through  the  extreme  values  as  the 
data  is  rescaled  to  [-TT,TT]  . Typically 


R(v) 


! civ* 
-tt 


dFn(x) 


where  Fr(x)  is  usually  a step- function  and,  in  many  statistical  appli- 
cations, F^(x)  is  a function  of  the  empirical  distribution  function. 

(Fn(x)  , xe  [-TT,TT] 3 is  a stochastic  process  indexed  by  x . We 
assume  throughout  this  chapter  that  this  process  has  the  following 
properties: 


(1.1) 


ty^(Fn(x)  - F(x»  , xe[-rr,n]}  converges  to  a Gaussian 
process  with  mean  0 and  covariance  function  a(x,y)  , 
x,y  e [-TT,TT]  , where 

- F(»)  has  a derivative  f(*)  2 0 , integrable 

TT 

- < J log  f (x)  dx  < 03 
n :n 

- J*  77  t dx  < so 

-TT 

n tt 

< J J a(x,y)  dxdy  <■  ® 

-TT  -TT 

TT  TT 

- J E[F  (x)]  dx  -*  J F(x)  dx 

11 


-TT 
TT  TT 


-TT 


TT  TT 


It  / \ » * 

- J J n CovlF  (x),F  (y)J  dx  dy  -»■  J J <J(x,y)  dx  dy 

x -TT  -TT  ' ° n * -TT  -TT 


c 


Theorem  5.1: 


(1.2) 

where 


Under  condition  (1.1),  for  any  p s 1 , 


\n 


1 Re(R(v)  - 

R(v))^ 

D 

i Im  (R(v)  ■ 

- R(v))j 

n -»« 

A(v) 

B(v) 

S(v)  = 

B(v) 

C(v)_ 

A(v)  = A(0)  + 2 * 
TT  TT 

• (-1)V 

+ ! I 

2 . 
v sxn 

-TT  -TT 

A(0)  « a(TT,TT)  + 0(-TT,-TT) 
TT 

B(v)  = (-l)v  j v 

' COS  V} 

-TT 
TT  tt 

- 

■J  s 

2 

v sin 

-tt  -tt 

B(0)  =*  0 

TT 

IT 

C(v)  = J* 

; v2 

cos  V X 

-TT 

-TT 

MVN 


(fi.S(v^)  , |vj  * p 


TT 

f ' 

-TT 


C(0)  = 0 


(The  mode  of  convergence  is  in  distribution.) 
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Proof : 


R(v)  = J eiVXd  F (x)  , W1  5 P 

-TT 


Using  integration  by  parts,  we  obtain 


(1.3)  R(v)  = M)V[F(TT)  - F (-TT)  J iveivXFn(x) 

11  -TT 


dx 


Taking  expectation  of  both  sides, 


E[R(v) ] 


(-1)V  [F(r;  - F(-TT)]  - J iveivXF(x)  dx 


n-»® 


which  is  equal  to  R(v)  as  can  be  seen  by  integrating  by  parts  again. 

For  the  covariance,  we  verify  the  exactitude  of  B(v)  only;  the 
other  terms  can  be  found  in  the  same  way. 


n Cov  ^Re^R(v)  - R(v)]  , Im  [R(v)  - R(v)]J 


n E 


uJe  J COS  vxd^Fn(x)  - F(x))  • J ^sin  vyd^y)  - F(y)]j 


Integrating  by  parts,  we  obtain 


' f . TT  ^ % 

|(-l)V(Fn(TT)  - Fn(TT)]  - (-l)V[Fn(-TT)  - F(-Tr)j  + J^v  sin  vx  [fJx)  - F(x)J 

f TT  | 

• /-J  vcosvy[Fn(y)  - F(y);  dy' 

\ -TT  * . 


r 


-w- 


• i ? 

* 

It  is  easy  to  obtain  B(v)  from  that  point. 

Finally,  as  the  integral  of  a Gaussian  process  is  Gaussian,  we 
obtain  the  third  element  of  our  theorem,  i.e.,  asymptotic  normality. 
This  completes  the  proof. 

It  is  also  clear  that  R(v)  is  a consistent  estimate  of  R(v)  , 
that  is 


(1.4)  R(v)  ~ R(v)  , |v|  * p , p*l  . 

u-»® 

This  follows  from  Theorem  5.1  and  Chebyshev's  inequality. 

We  can  then  get  a multivariate  analogue  of  Theorem  5.1. 


Theorem  5.2: 


Under  condition  (1.1), 


V^r 


(\ 1 R.(r(0) 
Im^R(O) 

0 

Re(R(k) 
\Jlm(R(k) 


S (kj) 


, k * p 


where 


-Z45-- 


£(k) 

-[ 

3 / 

= 

V 

. . 

= 

S(j) 

defined 

JJ 

( 

— 

— 

Aj  l 

Bjt 

— 

31 

_v 

C., 

E. , 

E. 

13 

31 

Aj  l 

= (-l)j+t 

A(0)  +( 

j < i 


■r 


TT 

J l 3in-t,y(;7(TT,y)  - a( 
-TT 

. TT 

+ (“1;  J j sin  jy(a(TT,y)  - <j(-TT,y))  dy 
-TT 

TT  TT 

+ J J*  3l  sin  jx  sin-ty  a(x,y)  dx  dy 
-TT  -TT 


a i 


31 


A(0)  , as  in  Theorem  5.1 

. TT 

(-1)-5  f4l  cos  -ty(a(-TT,y)  - a(TT,y))  dy 
-TT 

TT  TT 

■n»  sin  jx  co3  ly  a(x, y)  dx  dy 
-TT  -TT 

l I*11 

(-1)  j j cos  jy  (j(-TT,y)  - a(TT,y))  dy 
-TT 

TT  r.17 

- J J 15  sin  lx  cos  jy  cr(x,y)  dxdy 

-TT  -TT 

J J 51  COS  jx  cos  ly  cj(x,y)  dxdy 
-TT  -TT 


'Ol 


bm.  - 0 


i 


Proot : 

The  proof  is  really  equivalent  to  that  of  Theorem  5.1  and  so  it  is 
omitted. 


U4- 


We  will  come  back  to  these  formulas  in  the  next  chapter  when  we 
examine  three  different  routes  to  density  estimation. 


t 


-167- 


5.2  Consistency  of  Ct  and  K 

~P  P 

It  would  be  possible  without  a doubt  to  carry  out  the  same  program 
as  Kromer  (1969).  But  our  main  interest  lies  in  relating  the  order  of  the 
autoregressive  estimator  to  the  sample  size. 


Lemma  5.3: 

For  any  p-dimensional  vector  x and  matrix  X (random  or  not) , 

^^P  P 

define 


Then, 


llx  II2  = E |x  |2 

~P  i=l  i* 

llx  II2  = I lx.  . | 2 

p i,J-l  13 


llx  II 

P H 


• x 


~P 


HI 


(2.1) 


llx. 


* 2SJI 
P 


llx  II 

p H 


llx  II 

~p 


(2.2) 


t 

IIXpllH  * llxpll  £ p • maxtjx^j  , ij  = l,...,p} 

- 

llx  II  s \TF  • max[|xi|  , i =*  l,...,p} 


i 


4 


-Ifcg 


(2.3)  If  X is  Hermitian  and  positive  definite,  then 

P 


lx  II  = X (X  ) 
p h max  p 


"V'h  ■ wy 


where  X (X  ) (X  . (X  ))  is  the  maximum  (minimum)  eigenvalue  of  *1 
max  p min  p p 


(2.4)  If  Yp  is  non- singular  and 


then. 


Jlx  - Y I!  s - 1 
p p H Ihf  lllu 

P H 

, 1Iy"1II 

Ilx_1l!  * -2-5  . 

pH  e 


Proof: 


Most  of  these  assertions  are  well-known.  For  (2.4),  sen  Davies 


(1973), 


Theorem  5.4: 


Let  f(»)  be  integrable  and 


m £ f(x)  £ M , a.e.  in 


Let  R (v)  * J eiVX  f(x)  dx  , 

-TT 
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Let 


Then 


R (0)  ...  R (P  - 1) 


R (1  - p)  ...  R (0) 


(2.5)  2"®  ‘ WV  S WV  4 M * *" 


and 


U“  \»in<V 

p-»oo 


= m 


• 2TT 


(2.6)  { 


U”  WV  * M * 2TT 

p-><» 


Proof: 


See  Grenander  and  Szego  (1958),  Chapter  5. 


Theorem  5.5: 


If  0 < m £ f(x)  ^ M < ® , 


lim 

n-»® 


0 , 


a.e.  in  [-TT,TT]  and 


then 

€ 


- no 


if 

i 


€ 


(2.7) 


a 


P 

> 

n -»  os 


0 


K . .i 
P(n) 


0 

n-*«e 


Note:  From  now  on,  we  will  use  p instead  of  p(n)  to  simplify  the 

notation,  but  with  the  understanding  that  p is  allowed  to  increase  with 
n at  the  rate  specified. 


Proof : 


Let 


a 

~p 


a 

~p 


r 

~P 


r 

~P 


R 


P 


R 

P 


(a.  ) 

lp  pp 


(a.  ) 

lp  pp 


(R(-l),...,R(-p))' 

(R(-l),...,R(-p))' 

1(0)  ...  R(p-i) 

• • 

• • 

* * 

_R(l-p)  ...  R(0) 

A A 

R(0)  ...  R(p-l) 

• * 

• m 

A A 

R(l-p)  ...  R(0) 


1 

5 


- I7t  - 


We  then  have 


(2.8) 


A A -«  j-  A A 

a - a = R l(r  - r ) + (R 
~p  ~p  p L ~p  ~p  p 


- R ) a I 
P ~PJ 


Tnis  is  true  because  the  Yule-Walker  equations  can  be  written 


r a 
P~P 

- r 
~P 

A A 

A 

r a 

- r 

p ~p 

~P 

A 

A -a 

n 

A A 

1 

lia 

- a II  s II  r_1II  • 

II  r - r ||  + II  R - 

• 

P 

~p 

~p  pH 

~p  ~p  p 

k. 

P ~P  j 

Now, 

Hr  - r llu  * p 

p p H 

A 

• raax{|R(v)  - R(v)| 

, |v|  ^ p] 

i , by  (2.1) 


By  Theorem  5.2, 


(2.9)  |R(v)  R(v)|  - 0p(-^) 


meaning  that  R(v)  converges  in  probability  to  R(v)  independently  of 
v at  the  rate  of  1/^nT  as  n ® 

Thus 


R - R I 
P P 


0 

\F; 


Now  as  lira  = 0 there  exists  € > 0 such  that 

n ) 

n -»  oo  v 


-£r  < -Llr-  = (l-e)  • X . (R  ) ..  by  (2.3) 

yfn  |iR  XH  mir  P 

P H 


5 (1  - £)  • M • 2tt  by  Theorem  5.4 


A , Ilff'lL 

so,  I!r"1!I„  s — E-S 

pH  e 


, by  (2.4) 


* i"T7(R  ) ’ b*  <2-3> 

min'  p' 


£ * m • 2rr 


, by  Theorem  5.4  . 


Thus, 


lift  - a II  as- 

~P  ~P  e 


-ril 

• m *27^  ~p  ~p 


+ Hr  - r !l  ♦ l!a  II1 

p P ~p  . > 


I P 2 

I m \ 1 *- 


li$t  II  = \/  S la. 

**d  » . i ir> 


j=l  Jp  p -»  oo  j=l 


V S 1 a . | < ao  , by  Theorem  3 . 3 


JEp  - Epii  * /p  * max  (|R(v)  - R(v)|  , |v|  £ p}  , by  (2.2) 


0pW 


, by  (2.9)  ; 


Hr  - r II  -of -7§=rl 

p p P[fiT) 


, b-  (2.2)  and  (2.9) 


t 
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So 


11a  - a 11  = o ' 

~p  ~p  p . ; n ' 


i e lla  - a 11  o 

1*e> » <-p  ~p  n -»  oo 


if 


_£_  > o 


/=■  n 


For  the  second  part  of  the  theorem. 


K - K * £ a R(j)  - S a R(j)  , *y 

j=0  JP  Jr 


P p 4=n  3P  j=0 


= £ (a  (R(j)  - R(j»  + R(j)  (ajp  - ajp'}  . 

j=0  3P 


A 

!k 


-K|sS  Cl«,  I • |M3)  - *0>l  + !<U3>  • Kp-  ajpH 

P P :_n  JP 


S ( £ |a.  j2^  • ^pT-max  {|R(j)  - R(j)|  » U1  ^ 
\j=0  3P  ) 

+ I £ lR(j)l2f  * "®p  -“p11 
lj=0  3 

s(i+isf°<fl + * °*{t)  ■ 


Now,  we  certainly  have 


TT 


J*  |f(>:)l2dx  * 2TT  . M2  < <*> 
-TT 


and  so 


-174- 


p 2^ 

S 1 R(j) |2  j - 
j=0  j 


S |R(j)|  | < <° 

j=0  J 


(CR(j)}j_Q  is  the  set  of  Fourier  coefficients  of  f(»)  ).  Finally,  as 


iia  - a II  ■>  o 

~p  ~p 


HOpll  — > CSJ^i2)  < » 


so,  ]Kp-Kpl  = 0p(^r)  . 


This  completes  the  proof. 


The  0p(«)  notation  was  introduced  by  Mann  and  Wald  (1943). 

* A * / , 

Redefine  a = (a  , . . . ,a  ) and  a = (a  , . . . ,tt  ) . Theorem  5 . 5 

~p  op  ’ pp'  ~p  op’  * pp 

A 

still  holds  because  a ® a = 1 , and  it  follows  from  it  that 

^ p 

Http  - ttll  > 0 , where  a = (tt^  ,a^  ,ft2 , . . . / , because 


lla  - all  s lla  - a |l  + lla  - ail 

D **  «--D  ~ 


lla  - all2  = f S (a.  - a.)  eljx 

~P  ~ -TT  j=0  ^ J 


dX  * <ajP  = ° ’ 


” • J krcs  - a*)  • f(x) 

m . jp  j 


I • [ihT(eix)  -<P>i:V  I1-  7^  •«<C1X)llJ2- 

® L p F J p F 

— ■>  0 , by  Theorem  3.4,  (3.3.0.12)  and  (3.2.1.16). 


i 
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Also, 


Hk  - Kii  — — > 0 

p 


as  Kp  -»  K by  (3.3.0.12) . 


Berk  (1974)  also  proves  the  consistency  of  the  autoregressive 

3 

coefficients,  but  his  rate  of  consistency  is  p /n  and  he  has  to  make 
the  further  assumption  that 


P 


L 

j=p+l 


> 

P “*  CD 


0 


(see  his  Lemma  3 and  equation  2.17). 

2 

We  only  require  p /n  -»  0 . Even  though  our  contexts  are  different, 
our  proof  can  be  adapted  to  Berk's  context. 


"■n 


C 


- life' 


5.3  Consistency  of  f (•) 


From  Theorem  5.5,  we  will  now  obtain  results  that  parallel  those 
of  Chapter  4. 


Lemma  5.6: 


3/2 


then 


If  0 < m £ f(x)  s ® , a.e.,  in  t“TT»Tr]  and  lim  — — = 0 , 

n->oo  yj  n 


|cp*(e*x)  - cp* (e*X) i — - — ■>  0 , uniformly  in  x . 

1 p p 1 n -» oo 


Proof: 


t 


*,  ix. 
cp?(e  ) 


* , ix. 
c?p(e  ) 


i F a 


J K j=0 

P 


Z a,  eijx  - -J-  £ a.  eijx 


JT  j=Q 

P 


yrA  (cvveijx+ 

p 


Eo. 

Kp  •/ru- o jp 


£ la.  -a,  I 

_V  jP  jP1 


i ix.  *.  ix . | , 1=0 

|CP  <e  )-(p  (.  )|  s x-  pr 

H w V K 


/r  /r 

p p 


s la  | 

j=0  Jp 


/p-*  !1V2p1! 

7T"^  + 

p 


1 1 


/r  yr 

p p 


/r  itepii 
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As  Ik  - k I — — > o , |/k~  - JT I — — > o 

* n n i 7 ‘ D D ‘ 


because  is  a continuous  function  and  | - -y====  | — — > 0 

v K 


because  the  reciprocal  is  a continuous  function  and 


0 < JT  < JUT  < {£[  < co  , by  (3.3.0.12)  . 


Thus 


t *.  ix.  *.  ix.  t 
|cpp(e  ) - cpp(e  )\  = Op 


3/2 


«/T 


This  completes  the  proof. 


Theorem  5.7 : 


If  0 < o £ f(x)  5 M<®  , a.e.  , in  [-TT,tt]  and 

3/2 

lim  — r ' 0 , t^n 

n-»eo  v'n 


f 

-TT 


f (X)  f 00 

P 


dx  — *— > 0 

n cd 


Proof : 


T* 

-TT 


I 


f (X)  fOO 

P 


dx  s J 
-TT 


fp(x)  fp(x) 


dx  + J* 

-TT 


fp(x) 


f(x) 


dx 


By  Theorem  4.1,  with  the  addition  that  f(*)  is  essentially 
bounded,  we  have  that 


TT 

s 

-TT 


fp(x) 


f(x) 


dx 


p->  < 
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-» 

* 


On  the  other  hand, 


TT 


tn 


. f 


-TT 


c / \ f (x) 
fp(x)  p' 


dx 


TT 


* 2H  S (|cp_(elX)|  + |Cp  (e1X)  j ) 
-TT  F F 


|cp*(e1X)  - Cp*(elx)  J • f(x)  dx 


^2IT  f(J  |ip*(eix)j2«  f(x)  dx)%  + (J  \cp*(eix)|  • f(x)  dx)% 
L -TT  P -TT  P 


As 


and  so 


’ [ / * 9p(e1X)J2  • f(x)  dxl  ^ 


jcp*(eiX)  - Cp^(eix)|  — — 0,  uniformly  in  x , 


}Cp*(e*X)  - cp*(e*x)|2  — — > 0,  uniformly  in  x 


J*  l^elX)  " * f(x>  dx  0 

-TT  P P 


We  also  have  that 


2 Hr," 

• f(x)  dx 


- J"  |<Pp(e15C)|Z  • f(x)  : > o 


c 


because  the  square  function  and  the  integral  are  continuous  transforma- 
tions. 


i.  — wbm  jjbt  > 
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* ixxl2  - ■ - \* 


Finally,  (J  lPp(e1X)p  • f(x)  dx)  = 1 , by  (3.2.1.16) 
Thus , 

/_3/2  j 


TT 


1 

; 


-TT 


% , v f (x) 

fp(x)  PV 


dx  = CL 


WT 


/ 


This  completes  the  proof. 


Theorem  5.8: 


3/2 

Under  condition  (4. 1.1. 4)  and  if  lim  ^ 

n~>co  J n 


= 0 


then 


Proof: 


TT 


-TT 


fp(x) 


1 

f(x) 


2 P 

dx  — > U 


TT 

I 

-TT 


I 


1 1 

2dx  X f 
-TT 

1 1 

2 r71 

dx  + j 
-TT 

1 1 

?.(»)  f« 

e , \ f (x) 
f (x)  p' 

fp(x)  f(x) 

p 

P 

dx 


By  Theorem  4.2,  we  have  that 


fp(x) 


->  0 . 

p -*  a> 


On  the  other  hand, 

t 


as 


«a.i«T  r i 


-IBl  - 


This  completes  the  proof. 


Theorem  5.9: 


Under  condition  (4. 1.2.1)  and  if 


2TT  * jir(e*X)|2  — — > 0 , uniformly 


fp(x) 


*-•'  |TT(eL)|2 


->  0 , uniformly 


Proof: 


~~  - 2Tt  • |TT(eix)|2  s jx-1— 
£p(x)  I fp(x) 


_i_  + I_JL_ 

Vl)  ! Vx) 


By  Theorem  4.3, 


rfe  - 2"  * ln(clxM2 

P 


P -»  o 


On  the  other  hand, 


182' 


c / N f (x) 

fp(x)  PV 


s 211 ( l^pCe134)  + cpp,eix)|)  •|cpp(eiX)  " cpp( 


* 2tt<2C  + Op(l>)  • 0p 


3/2 
B 


as  in  Theorem  5.8. 
Thus, 


1 1 

a 0 

fp3/z\ 

C , . f (x) 

fp(x)  Px 

* 

1\f) 

For  the  second  part  of  the  theorem. 


VX)  - 2tT 


|n(el3l)|2 


fp(x) 


fp(x) 


fpW  - 2S 


By  Theorem  4.3, 


fp(x) 


_1_ 

2rr 


|TT(eiX)|2 


p a> 


->  0 


iy*> 


- £ (x)|  — — > 0 

D 1 n-*eo 


|TT(eiX)|2 


because  the  reciprocal  function  is  a continuous  function  and 


CHAPTER  6 


THREE  WAYS  TO  DENSITY  ESTIMATION 


6.1 


The  Three  Ways  and  the  Basic  Assumptions 


We  assume  from  the  start  that  we  want  to  approximate  the  density 
of  a bounded  (1.1)  random  variable  whose  range  is  taken  to  be  [-TT,TT] 
without  loss  of  generality.  Note  that  we  can  always  replace  the  word 
''approximate''  by  "estimate."  Also,  we  will  work  on  the  natural  interval 
of  definition  of  each  function. 

The  three  basic  ways  to  approximate  a density  are: 

— the  direct  approach:  approximate  f(») 

— the  sparsity  approach:  approximate  q(*)  the  derivative  of  the 

quantile  function  and  form  f(Q(t))  » - , where 

Q(t)  = J*  q(u)  du 
0 

— the  hazard  approach:  approximate  h(»)  the  derivative  of 

x 

(- log  (1  - F(»))  and  form  f(x)  =>  h(x)  • exp  (- J h(u)  du) 

-TT 

The  distribution  function  F(»)  is  always  a bounded  nondecreas- 
ing function  and,  under  (1.1),  the  quantile  function  Q(*)  is  also 
bounded  nondecreasing,  but  the  integrated  hazard  H(»)  is  unbounded 
though  nondecreasing. 

Thus  H(»)  does  not  really  fit  in  here  even  though  we  have 
obtained  good  empirical  results  in  Part  I of  our  research,  it  is  to 
be  noted  that  we  never  attempted  to  approximate  h(«)  on  the  whole 
range  [-tt,tt]  but  rather  on  [-TT,n-e]  . 


- I 8(o  - 


The  log-integrability  condition  can  then  be  expressed  in  various 

ways: 


(1.2) 


/ 

TT 

J*  log  f(x)  dx  =» 

-TT 

J log  c(t)  dt  = 

0 

< 

n-e 

J*  log  h(y)  dx 

-TT 

^rr-e 

J log  h(x)  dx 

-TT 


1 

J - q(t)  • log  q(c)  dt 
0 

TT 

J f(x)  • log  f(x)  dx 

-rr 


C 108 


dx 


1-6 

j -q(t)  • log  {(1  - t)  • q(t)}  dt 
0 


If  the  interval  [ -tt,tt]  is  replaced  by  (•*“,“)  , then  log  q(t) 
integrable  is  the  weakest  assumption.  That  gives  more  weight  to  the 
indication  that  the  sparsity  approach  might  be  preferred  in  "tough"  sit- 
uations as  seen  in  Fart  1.  Another  reason  to  prefer  the  sparsity 
approach  is  that  the  sparsity  is  naturally  defined  on  the  bounded  inter- 
val {0,1]  . The  only  problem  is  then  the  unboundedness  of  the  quantile 
function,  i.e.,  q(«)  will  not  be  integrable.  But  research  has  started 

to  extend  the  applicability  of  the  autoregressive  method  in  that 
direction. 


t 


6.2 


The  Empirical  Processes 


In  Section  5.1  we  have  imposed  certain  conditions  on  the  empirical 

process  {F  (x) , x € t-TT,rr] } from  which  we  estimate  the  R(»)  sequence, 
n 

We  now  illustrate  what  the  general  formulas  look  like  in  the  case  of 
density  estimation. 


6.2.1  The  Direct  Approach 

In  the  direct  density  estimation  case,  Fr(x)  is  the  empirical 
distribution  function.  It  is  well  known  that  )[ri  {Fn(x)  ~ F(x)}  con- 
verges to  a Brownian  Bridge  process,  that  is  to  a Gaussian  process  with 
mean  zero  and  covariance  function 


(1.1) 


o(x,y)  = F(x)(l-F(y))  , x S y . 


Using  the  same  notation  as  in  Section  5.1,  we  have  for  instance  that 


(1.2) 


TT  TT 

A(v)  - j J*  v sin  vx  sin  vy  a(x,y)  dxdy 
-TT  -TT 


TT  TT 

Bu  " - J*  J*  sin  j x cos  ly  cr(x,y)  dxdy 

J -TT  -TT 


The  other  formulas  can  easily  be  guessed  from  those  two. 
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We  can  simplify  them  even  more  when  we  carry  out  the  integration: 


A(v)  = Var  (cos  v X) 


(1.3) 


= Cov  (cos  j X,  sin -tX) 


where  X is  distributed  according  to  F(*)  . Also  we  find  that 


(1.4)  n Cov(R(j)sR(t»  — > R(j  - l ) - R(j)  R(~t) 


which  can  be  written  as 


I”11  c/  \ j ^ ijx  ci  \ j Pn  ci  - > 

J e 'J  7 f(x)  dx  - j e J f(x)  dx  • J e f(x;  dx  . 


This  can  be  contrasted  with  results  obtained  in  time  series  analysis  as 
in  Kroner's  dissertation  (1969).  Let  us  note  first  that  Kromer  was 
estimating  the  spectral  density  of  an  observed  real  time  series  whereas 
we  are  estimating  the  spectral  density  of  a complex  hypothetical  time 
series.  In  his  Theorem  3.2,  Kromer  shows  that 


(1.5)  n Cov  (R(j),R((,))  > 2TT  • J (e1^^*  + e1  . f2(x)  dx 


where  f(»)  is  the  spectral  density  of  the  real  time  series. 


f 


M 
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We  emphasize  that  the  important  difference  is  not  between  real 
and  complex,  but  between  observed  and  hypothetical.  That  is  the  differ- 
ence that  forces  us  to  use  a different  type  of  estimator  of  the  R(*) 
sequence. 

So  our  analogy  between  density  estimation  and  spectral  density 
estimation  breaks  down  at  the  point  where  we  evaluate  the  variance  of 
the  estimators. 

6.2.2  The  Hazard  Approach 

In  the  hazard  approach,  the  empirical  process  that  we  have  used 
is 


(2.1)  log  (1  - ^y-Fn(x))  , 

where  F (x)  is  the  empirical  distribution  function  as  in  6.2.1. 
n 

By  using  the  delta-method,  we  can  find  its  limiting  distribution 
to  be  Gaussian  with  mean  leg  (1  - F(x))  and  with  covariance  function 

(2.2)  0<x,y)  = Fffi)  , x * y , 

(l  - F(x))Z 

Note  that  a(x,y)  ^(xj-fr'P " * 
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The  form  (2.1),  though  satisfactory  in  practice,  is  quite  unsuitable  for 
theoretical  study.  We  need  a procedure  that  will  never  take  us  arbi- 
trarily close  to  F(x)  = 1 . 

6.2.3  The  Sparsity  Approach 

In  the  sparsity  approach,  we  use  the  empirical  quantile  process 
{Qn(t)  , 0 i t s l]  . In  the  literature  on  order  statistics,  we  find 
that  \JHT (Qn(t)  - Q(t))  is  a Gaussian  process  with  mean  0 and  covar- 
iance function 


(3.1) 


C(t1,t2)  = t1(l  - t2)-q(t1)  * q(t2)  , 0 £ * t2  £ 1 


provided  that  q(*)  is  continuous  and  finite  (see  for  instance  Cox 
and  Hinkley  (1974),  Appendix  2). 


There  is  no  real  simplification  obtained  in  trying  to  carry  out 
the  integration  as  in  (1.3).  But  we  can  still  estimate  the  variances 

A 

and  covariances  of  the  R(*)  sequence  by  using  an  estimator  of  (3.1) 
in  formula  (5.1.1. 2).  This  last  formula  is  given  on  [-TT,TT]  so  car^ 
has  to  be  taken  in  reformulating  (3.1)  : 


c(x,y) 


5±s UZZJL  . 


2TT 


2tt 


2TT 


6.3 


Conclusion 


We  have  started  Part  II  with  some  questions  carried  over  from 
our  empirical  experience.  Can  we  now  provide  some  answers? 

For  example,  we  have  uncovered  the  "odd-even"  phenomenon,  but  we 
cannot  explain  it  as  such.  It  does  not  appear  very  markedly  in  estima- 
tion problems,  probably  because  the  estimators  we  start  with  are  very 
crude. 

Symmetrization  works  very  well  in  the  exponential  case,  but  not 
so  well  for  the  chi-square.  One  reason  could  be  the  effect  it  has  on 
the  characteristic  function. 

For  an  exponential,  the  characteristic  function  goes  to  zero  as 
1/ v , but  its  real  part,  which  is  the  characteristic  function  of  the 
symmetrized  exponential  (the  Laplace  distribution),  goes  to  zero  as 
1/v2  . 

For  the  chi-square  with  4 degrees  of  freedom,  the  characteristic 

2 

function  goes  to  zero  as  1/v  and  so  does  its  real  part.  Thus  there 
is  no  real  gain  in  symmetrizing  in  such  situations. 

We  have  accumulated  quite  a lot  of  evidence  as  to  transformations 
of  data  with  regard  to  densi^  estimation.  The  three  approaches  we 
used  are  not  exactly  equivalent.  In  Section  6.1,  we  underlined  the 
assumptions  behind  each  and,  i.i  Fart  I,  we  stressed  the  relations 
between  the  approaches  and  qualitative  aspects  of  the  data. 
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The  best  conclusion  in  practice  would  be  to  use  the  different 
approaches  on  the  same  data.  So  far  it  seems  that  the  quantile  approach 
is  quite  robust.  We  illustrate  this  statement  by  pictures  of  the 
exponential  and  the  normal  obtained  via  the  quantile  approach  (see 
Fig.  1 and  2).  In  Figure  1,  we  have  averaged  the  exponential  approxi- 
mators of  orders  8 and  9 . In  Figure  2,  we  have  the  normal  approxima- 
tor of  order  9 . 
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